Self-Consistency

An advanced technique to improve the accuracy of complex reasoning by generating multiple paths and taking a majority vote.

Self-Consistency

Self-Consistency is an advanced technique that makes Chain-of-Thought (CoT) prompting even more powerful and reliable. Instead of just generating one chain of thought, it generates several different reasoning paths and then chooses the most consistent answer by a majority vote.

It's like asking a committee of experts to solve a problem independently and then taking the answer that most of them agree on. This simple but powerful idea significantly boosts accuracy on demanding logical and arithmetic tasks.

How It Works

  1. Prompt Variation: You use a Chain-of-Thought prompt, but you ask the model to generate multiple different responses. This is usually done by adjusting a setting called "temperature" in the API, which encourages diverse outputs.

  2. Generate Multiple Paths: The model produces several different step-by-step reasoning paths. Some might be slightly different, and some might even contain errors.

  3. Majority Vote: You look at the final answer from each path and choose the one that appears most frequently.

Example: The Farmer Problem Revisited

Let's use our previous word problem. We ask the model to solve it three times, and we get three slightly different reasoning paths.

Path 1:

"First, calculate the new apples: 2 boxes * 12 apples/box = 24 apples. Then, add to the initial amount: 15 + 24 = 39 apples. Finally, subtract the given apples: 39 - 7 = 32 apples. The answer is 32."

Path 2:

"The farmer starts with 15 apples. He adds two boxes, which is 24 apples (2 * 12). So he has 15 + 24 = 39. Then he gives 7 away. 39 minus 7 is 32. So he has 32 left."

Path 3:

"Let's see. Two boxes have 2 * 12 = 24 apples. Total apples before giving any away is 15 + 24 = 39. After giving 7 away, he has 39 - 7 = 32. The final answer is 32."

Even though the wording is slightly different in each path, the final answer in all three is 32. This gives us very high confidence that 32 is the correct answer. If one path had produced an answer of "20", we would ignore it because it's in the minority.

This method is more computationally intensive because it requires generating multiple responses, but it's one of the most effective ways to achieve state-of-the-art performance on complex reasoning benchmarks.

Last updated