Zero-shot and few-shot prompting are basic prompting techniques used to elicit reasoning from large language models. Each of these approaches comes with its advantages and limitations and can be successfully used in various scenarios.

Below is the overview of zero-shot and few-shot prompting and a brief summary of key studies providing statistics from experimenting with each of these techniques on different LLMs used on a wide range of tasks.

What is Zero-Shot Prompting?

With zero-shot prompting, a large language model does not receive any examples with the prompt. In this scenario, an LLM is expected to rely on its own capabilities to find the answer without demonstration in the prompt.

In general, LLMs are capable of doing so after being trained on large volumes of data and having built an extensive knowledge base. For example, when given the prompt below, GPT-4 understands the "sentiment" without additional explanations and is able to come up with the correct answer.

Classify the text into neutral, negative or positive. 
Text: I think the weather is okay. 
Sentiment:

Results:

Neutral

What is Few-Shot Prompting

With a few shot prompting, the LLM is given at least one or more prompts to provide the context and steer the model in the process of finding an answer. Such prompts serve as demonstrations or examplars, conditioning the model for the final question, where it is asked to provide a response.

For example, in the Brown et al. [2022] study, the research team leverages few-shot prompting when asking the LLM to correctly use a made-up word in a sentence:

Prompt:

A "whatpu" is a small, furry animal native to Tanzania. An example of a sentence that uses the word whatpu is:
We were traveling in Africa, and we saw these very cute watpus.
A "yalubalu" is a type of vegetable that looks like a big pumpkin. An example of a sentence that uses the word yalubalu is:

Output:

I was on a trip to Africa, and I tried this yalubalu vegetable that was grown in a garden there. It was delicious

The above prompt is an example of one-shot prompting, offering a single demonstration to the LLM before requesting an answer to the actual question. Other contexts may require providing a language model with more examples to set it on the right trajectory when it is looking for an answer to a given task.

Zero-Shot or Few Shot: Which to Use

As demonstrated by studies, including Wei et al. [2022], large language models have remarkable zero-shot capabilities. In addition, zero-shot prompting provides maximum convenience to users and helps to avoid misleading correlations.

At the same time, zero-shot prompting is the most challenging scenario. Sometimes, even humans can have issues understanding the tasks without demonstrations. For example, the task similar to "making a table of world records for the 200m dash," mentioned in the study by Brown et al. (2020), may sound ambiguous as it is unclear what should be included in such a table.

In such cases, using few-shot prompting helps to provide additional context and steer an LLM on its way to find a solution to a problem or to answer a question. Powered by in-context learning through few-shot prompting, LLM can demonstrate better performance on par with state-of-the-art fine-tuning approaches.

Current Studies on Zero-Shot and Few-Shot Prompting

Both zero-shot prompting and few-shot prompting are powerful techniques that have been widely researched in several studies. The below list outlines the main works frequently cited in the context of prompt engineering related to zero-shot and few-shot prompting.

Kojima et al. (2022)

In the fundamental work by Kojima et al. (2022), titled Large Language Models are Zero-Shot Reasoners, the group of researchers from the University of Tokyo and Google Research team note on LLMs' capabilities on zero-shot reasoning when given the prompt "Let's think step by step" after each question. The study compares zero-shot prompting and few-shot prompting as well as zero-shot Chain-of-Thought and few-shot Chain-of-Thought methods on a range of tasks and large language models.

Below is one of the summaries from the Kojima et al. report, providing the LLMs' scores with MultiArith and GSM8K arithmetic tasks.

Table 1. Comparison of Zero-Shot, Few-Shot, Zero-Shot CoT (Chain-of-Thought) and Few-Shot CoT methods on MultiArtih and GSM8K tasks. Source: Kojima et al. (2022)

Brown et al. (2020)

Another frequently cited work, conducted by Brown et al. (2020), titled Large Language Models are Few Shot Learners, experiments with how scaling improves task-agnostic few-shot performance by large language models.

The researchers apply GPT-3 with few-shot demonstrations exclusively through text interaction with the LLMs. The experiments demonstrated high performance of GPT-3 with diverse tasks, such as translation from foreign languages, answering questions and on-the-go reasoning.

In particular, the study provides a table showing the zero-shot, one-shot, and few-shot performance of GPT-3 on arithmetic tasks, including addition, subtraction and multiplication of n-digit numbers. The stats demonstrate progressively stronger results when shifting from zero-shot to one-shot to few-shot prompting.

Table 2. Comparison of accuracy on basic arithmetic tasks for GPT-3, where nD+ or nD- stands for addition and subtraction of n-digit numbers, 2Dx for multiplication and 1DC is 1-digit composite operations. Source: Brown et al. (2020)

Wei et al. (2022)

A widely-sited work by Wei et al. [2022], titled Finetuned Language Models Are Zero-Shot Learners, further explores how to improve zero-shot prompting. The researchers fine-tune large language models on a collection of datasets and evaluate how this conditioning improves zero-shot performance. Here is the diagram from the study demonstrating the performance of GPT-3 with zero-shot and few-shot prompting, compared with the results from fine-tuned LLM, referred to as FLAN.

Figure 1. LLM performance on several types of tasks, where FLAN stands for an instruction-tuned model. Image source: Wei et al. [2022],

Limits to Few Shot Prompting

While few-shot prompting is superior to zero-shot prompting in multiple scenarios, allowing to provide LLMs with demonstrations to improve their reasoning abilities, this technique has limitations of its own.

First, few-shot prompting still requires a certain amount of task-specific data to provide for demonstrations. Second, few-shot prompting falls short of fine-tuned models, as convincingly demonstrated in the Wei et al. (2022) report cited above.

Draw Upon Zero-Shot and Few-Shot Prompting Techniques with VectorShift

Both zero-shot and few-shot prompting are powerful prompting techniques that can steer language models into human-like reasoning on a range of tasks. That said, each of these methods has its own advantages and limitations, which should be considered when choosing a prompting strategy.

Utilizing zero-shot, few-shot or Chain-of-Though prompting can be made so much easier when done on platforms like VectorShift, offering no-code or SDK interfaces for introducing AI into your applications. For more information on VectorShift capabilities for prompt engineering, please don't hesitate to get in touch with our team or request a free demo.