Overview of Few-Shot Prompting

on a month ago

Overview of Few-Shot Prompting

While large language models demonstrate impressive zero-shot capabilities, they still underperform on more complex tasks when using zero-shot settings. Few-shot prompting serves as a technique to enable in-context learning, where we provide demonstrations in prompts to guide models toward better performance. Demonstrations act as conditions for subsequent examples, with the hope that the model will generate appropriate responses.

According to research by Touvron et al. 2023 and Kaplan et al., 2020, when model scales become sufficiently large, few-shot prompting capabilities begin to emerge.

Example of Few-Shot Prompting

Take the new-word usage task proposed by Brown et al. 2020 as an example:Prompt:

Plaintext

"Whatpu" is a small furry animal in Tanzania. An example sentence using the word "whatpu" is: We saw these very cute whatpus during our trip to Africa. "Farduddle" means to jump up and down quickly. An example sentence using the word "farduddle" is:

Output:

Plaintext

When we won the game, we all started to farduddle in celebration.

The model learns to perform the task by providing 1 demonstration (1-shot). For more difficult tasks, we can try increasing the number of demonstrations (e.g., 3-shot, 5-shot, 10-shot, etc.).

Key Tips for Few-Shot Learning

Based on findings from Min et al. (2022), here are additional tips about demonstrations/examples in few-shot learning:

"Both the label space and the distribution of input text specified by demonstrations are important (regardless of whether the labels are correct for individual inputs)."
Formatting also plays a critical role in performance—even using random labels is better than having no labels at all.
Other results show that selecting random labels from the true label distribution (rather than a uniform distribution) is also helpful.

Let’s try some examples. First, a random label example (assigning labels "Negative" and "Positive" randomly to inputs):Prompt:

Plaintext

This is amazing! // Negative This is terrible! // Positive Wow, that movie was awesome! // Positive What a horrible show! //

Output:

Plaintext

Negative

Even with randomized labels, we still get the correct answer. Note that we retained the formatting, which also helps. In fact, further experiments show that the new GPT models we tested have become more robust to random formats. For example:Prompt:

Plaintext

Positive This is awesome! This is bad! Negative Wow that movie was rad! Positive What a horrible show! --

Output:

Plaintext

Negative

Despite inconsistent formatting, the model still predicts the correct label. More thorough analysis is needed to confirm whether this holds for different and more complex tasks, including various prompt variations.

Limitations of Few-Shot Prompting

Standard few-shot prompting works for many tasks but is not a perfect technique, especially for more complex reasoning tasks. Let’s demonstrate why this is the case. Remember the previous task:

Plaintext

The sum of the odd numbers in this group is an even number: 15, 32, 5, 13, 82, 7, 1. A：

If we try again, the model outputs:

Plaintext

Yes, the sum of the odd numbers in this group is 107, which is an even number.

This is incorrect, highlighting not only the limitations of these systems but also the need for more advanced prompt engineering.

Let’s try adding some examples to see if few-shot prompting can improve results.Prompt:

Plaintext

`The sum of the odd numbers in this group is an even number: 4, 8, 9, 15, 12, 2, 1.
A：Answer is False.

The sum of the odd numbers in this group is an even number: 17, 10, 19, 4, 8, 12, 24.
A：Answer is True.

The sum of the odd numbers in this group is an even number: 16, 11, 14, 4, 8, 13, 24.
A：Answer is True.

The sum of the odd numbers in this group is an even number: 17, 9, 10, 12, 13, 4, 2.
A：Answer is False.

The sum of the odd numbers in this group is an even number: 15, 32, 5, 13, 82, 7, 1.
A：`

Output:

Plaintext

Answer is True.

This still fails. It appears that few-shot prompting is insufficient to obtain reliable responses for this type of reasoning problem. The examples provided only convey basic task information. Upon closer inspection, the task involves multiple reasoning steps. In other words, decomposing the problem into steps and demonstrating the process to the model may help. Recently, Chain of Thought (CoT) prompting has gained popularity for solving more complex arithmetic, commonsense, and symbolic reasoning tasks.

In summary, providing demonstrations is useful for certain tasks. When zero-shot and few-shot prompting are insufficient, it may indicate that the model has not learned enough to perform well on the task. At this point, it is recommended to consider fine-tuning the model or trying more advanced prompting techniques. Next, we will discuss a popular prompting technique called Chain of Thought prompting, which has garnered significant attention.