By Albert Mao
Jan 10, 2024
Prompt Engineering, Fine-Tuning LLMs or RAG: Which Is Best for Your Applications?
In this article, we summarize prompt engineering techniques, LLMs fine-tuning and Retrieval Augmented Generation approaches, comparing them side-by-side for your further analysis.
Many observers have been viewing prompt engineering as the future of interactions with Large Language Models, enticing LLMs into human-like reasoning with Chain-of-Thought, Tree-of-Thought and similar strategies. Meanwhile, even the most intricate prompting techniques have limits of their own, resulting in LLMs' hallucinations while requiring considerable time to send on individual prompts and relying on models' pre-trained knowledge. Below, we discuss the benefits and challenges of prompt engineering and compare prompting with LLM fine-tuning and Retrieval Augmented Generation (RAG) methods, drawing on existing studies.
What is Prompt Engineering?
Prompt engineering is a relatively simple method of steering Large Language Models through natural language instructions, aka prompts, to increase the quality of their output. Recently, there has been no shortage of prompt strategies for various applications. Here is a brief recap of prompt strategies used to improve LLMs' performance on various tasks.
Zero-Shot and Few-Shot Prompting
Zero-shot prompts rely on LLMs' own capabilities when asking to find an answer to a question without providing examples in the prompt. For example, if you give an LLM a sentence to read and then ask the model if this sentence sounds meaningful and accurate, it would be an example of Zero-Shot prompting relying on the model's abilities.
Few-shot prompting approach provides LLMs with one or more examples showing the pattern and then asking the actual question. For example, one may give an LLM a series of conversations and explain their sentiments as positive, negative or neutral and then ask the model to classify another conversation based on provided demonstrations.
The Chain-of-Thought methods build upon Zero-Shot and Few-Shot prompting to induce large language models into articulating their reasoning before answering a final question. For example, when given the Zero-Shot prompt, "Let's think step by step," LLMs generate an explanation of their reasoning before resolving the final task.
The study by Kojima et al. (2023) provides insightful statistics for Zero-Shot, Few Shot, Zero-Shot (CoT) and Few-Shot (COT) methods on MultiArtih and GSM8K word arithmetic tasks, summarized in the table below:
Table 1. Comparison of scores achieved with Zero-Shot, Few-Shot, Zero-Shot CoT (Chain-of-Thought) and Few-Shot CoT prompting on MultiArtih and GSM8K tasks. Source: Kojima et al. (2023)
Tree of Thought Prompting
Last, but not least, the Tree-of-Thought (ToT) prompting strategy expands the boundaries of LLM reasoning even further by inducing LLMs into building a tree-like structure for each thought when solving a problem and self-evaluating those thoughts while leveraging search algorithms. The studies, like Yao et al. (2023), demonstrated that ToT prompting helped GPT-4 achieve a success rate of 74% on math tasks compared to 4% with Chain-of-Thought Prompting.
What is RAG?
RAG or Retrieval Augmented Generation (RAG) has gained much popularity by allowing the LLMs to access external knowledge sources to complete tasks. The RAG method is particularly useful for knowledge-intensive tasks or data which could change over time.
On a high level, Retrieval Augmented Generation consists of two components, including information retrieval and text generation and goes through two stages. In the first stage, RAG receives an input query and retrieves relevant data from a given source, for example, Wikipedia. The text generation component takes the retrieved data as context, producing the final output.
Schematically, the RAG method was outlined in research by Lewis et al. (2022) titled Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.
Figure 1. In Lewis et al. (2022), the RAG framework uses Query Encoder, Maximum Inner Product Search (MIPS) to find the most relevant documents, Document Index. The Generator is a pre-trained seq2seq model. Source: Lewis et al. (2022)
What Is LLM Fine-Tuning?
Fine-tuning approach leverages the existing knowledge base of an LLM combined with task-specific training to optimize models' performance for given tasks. Fine-tuning proved particularly useful for high-volume cases, resulting in better accuracy and lower costs compared to prompt engineering strategies.
The fine-tuning process starts with collecting a sufficient volume of data and labeling such data, creating prompt-response pairs to provide for a supervised learning process for LLM. The process is aided by the application of learning platforms to train the LLM on the prepared dataset. Then, the model's performance is evaluated on a separate dataset where it can be adjusted with more data or via another platform.
The reports show better accuracy of fine-tuned LLMs compared to other approaches like Zero-Shot, Few Shot or Chain-of-Thought prompting. In the study by Wei et al. , titled Finetuned Language Models Are Zero-Shot Learners, a smaller fine-tuned LLM referred to as FLAN demonstrated better results compared with a larger GPT-3 model's output from Zero-Shot and Few-Shot prompting; the results are shown below:
Figure 2. Performance of fine-tuned LLM (FLAN) on Natural Language Inference, Reading Comprehension and Closed-Book QA. Image source: Wei et al. ,
Which of Prompt Engineering, RAG and LLMs Fine-Tuning Works Best and When?
Each of the above approaches, including prompt engineering, Retrieval Augmented Generation and Fine-Tuning, is aimed at improving LLM accuracy and performance and can be successfully applied when leveraging LLMs to solve various tasks.
That said, the efficiency of applying any of these approaches depends on the presence of factors and resources that can make one of the methods more fit for a particular purpose than the other.
When to Use Prompt Engineering?
For many types of tasks, prompt engineering offers ease of use through intuitive language-based instructions. Sophisticated prompt engineering techniques, like CoT or ToT, succeed in eliciting human-like reasoning with Large Language Models for more complex tasks, such as arithmetic math word problems, commonsense situations or human interactions.
When to Use Retrieval Augmented Generation?
Meanwhile, tasks where a model needs additional data exceeding its pre-trained knowledge base can be successfully approached with the Retrieval Augmented Generation (RAG) technique, researched by the Meta AI team. By accessing external knowledge sources, the RAG method enables LLMs to access the latest information for input generation and helps mitigate "hallucinations."
When to Use LLM Fine-Tuning?
Where the teams leverage an extensive dataset, possess necessary resources and technical expertise and aim to leverage their LLMs for specific tasks, fine-tuning reigns supreme, offering considerably higher performance compared to prompting.
When LLM's weights are optimized based on task-specific data, the fine-tuned LLMs can better leverage the context at lower costs, offering personalization, privacy and competitive advantage.
Current Studies on Prompt Engineering, RAG and LLM Fine Tuning
Large Language Models Are Zero-Shot Reasoners | Kojima et al. (2023)
In the realm of prompt engineering, there is no shortage of well-researched studies addressing various prompting strategies. In this extensive body of works, the study by Kojima et al. (2023) stands out as one of the most comprehensive reviews, covering Zero-Shot, Few-Shot, Zero-Shot CoT and Few-Shot CoT prompting, evaluating outputs of various LLMs in multiple settings.
Retrieval-Augmented Generation for Knowledge-Extensive NLP Tasks | Lewis et al. (2021)
The work by the Meta (Facebook) AI team explores a Retrieval Augmented Generations technique where LLMs combine their existing knowledge database with non-parametric memory. For the purposes of the research, the team has chosen Wikipedia as the external source accessed by the LLM via a pre-trained neutral receiver. The team further fine-tunes LLMs on a range of knowledge-intensive NLP tasks. In the results of its experiments, the team concluded that RAG models could generate more factual and diverse language than the state-of-the-art parametric-only seq2seq model.
Fine-tuned Models Are Zero-Shot-Learners | Wei et al. (2022)
A frequently cited work by Wei et al. (2022) explores fine-tuning a Large Language Model on datasets verbalized via language-based instruction templates. The Google Research team evaluated the fine-tuned 137B parameter model referred to as FLAN on several types of tasks and found that FLAN outperformed the Few-Shot method with GPT-3 having 175B parameters on a number of datasets including ANLI, RTE, BoolQ, AI2-ARC, OpenbookQA, and StoryCloze.
Draw Upon Prompt Engineering, RAG and LLM Fine-Tuning with VectorShift
Prompt engineering, fine-tuning and Retrieval Augmented Generation are effective strategies that can be applied independently or in combination to steer LLMs into better performance based on the specifics of the task as well as availability of data and resources. Meanwhile, making use of these techniques involves juggling multiple diverse processes and datasets, which can be complicated and take considerable time and resources.
Whether you prefer prompting strategies, opt for fine-tuning your LLM to gain competitive advantage and personalization or apply the Retrieval Augmented Generation method to leverage external data, platforms like VectorShift can help streamline the process with no-code or SDK interfaces for AI-based applications. For more information on how VectorShift can help you leverage prompt engineering, fine-tune your LLM or draw upon RAG strategies, please don't hesitate to get in touch with our team or request a free demo.