Retrieval-Augmented Generation (RAG)

Learn how to connect LLMs to external knowledge bases to provide factual, up-to-date answers.

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is a powerful framework that connects a Large Language Model to an external, up-to-date knowledge base. It solves one of the biggest problems with LLMs: their knowledge is frozen in time and limited to the data they were trained on.

RAG allows the model to "look things up" in a specific data source before answering a question. This makes its responses more accurate, trustworthy, and relevant.

How It Works

RAG combines two main components: a retriever and a generator (the LLM).

The Query: A user asks a question (e.g., "What were our company's Q2 earnings?").
Retrieval: The user's query is first sent to a retriever. This system searches a knowledge base (like a collection of company documents, a website's help articles, or a technical manual) for the most relevant information. This knowledge base is often stored in a special database called a "vector database."
Augmentation: The relevant information retrieved from the knowledge base is then combined with the original user query into a new, "augmented" prompt.
Generation: This detailed, context-rich prompt is sent to the LLM. The model uses the specific information provided to generate a precise and factual answer.

Essentially, the prompt becomes: Based on the following information [...retrieved documents...], answer this question: [...original question...].

Why RAG is a Breakthrough

Access to Current Data: It allows LLMs to answer questions about events or data created after their training was completed.
Reduced Hallucinations: By grounding the model in specific, factual documents, RAG significantly reduces the chances that the model will "make up" an answer.
Source Citation: Because you know where the information came from, you can include citations in the model's response, allowing users to verify the facts.
Use of Proprietary Data: Companies can use RAG to create chatbots and tools that have deep knowledge of their private, internal documents without having to retrain the entire model.

Example: A Corporate Help Bot

Imagine a new employee asks a chatbot a question.

User Query: "How do I request vacation time?"

Without RAG: The LLM gives a generic answer about how vacation policies usually work, which might be incorrect for this specific company.

With RAG:

Retrieve: The system searches the company's internal HR documents and finds the specific "Time-Off Policy" document.
Augment: It creates a new prompt: Based on this policy document [...contents of the Time-Off Policy...], answer the user's question: "How do I request vacation time?"
Generate: The LLM reads the policy and generates a precise answer: "To request vacation time, you need to submit a request through the 'HR Portal' at least two weeks in advance. Your request will then be sent to your direct manager for approval."

PreviousTree of Thoughts (ToT)NextAgentic Prompting

Last updated 1 month ago