RAG in a Nutshell

RAGs are supposed to enhance the capabilities of Large Language Models to serve companies and businesses in a more efficient and precise way. With RAG systems, we’re capable of limiting LLMs’ hallucinations, where models can generate inaccurate information. It is possible to surpass the knowledge cut-off problem, where LLMs are limited only to the information that was mostly publicly available during the training stage. And finally, access to the private enterprise data. Most of the companies have their own processes, tools, and knowledge gathered within a company space. With the RAG promise, it was possible to finally use it effectively.

RAG Is Not a Silver Bullet

Despite the promise that RAG systems are capable of extending LLMs to provide real business value, it’s not a silver bullet – at least not a default, out-of-the-box RAG. It has been found in many cases that a default RAG system is not that great. Of course, it’s better to implement a default RAG solution than do nothing, but still, the hope is bigger than the results. Some companies found that it’s relatively easy to build a simple RAG system and offer it to everyone as an ideal solution for their problems. While in some situations that may be the case, we mostly observe that implementing a RAG system without customizing it to the company’s needs, becomes more of an issue than a benefit.

Out-of-the-box Retrieval Augmented Generation systems may come up short when it comes to evaluation and making sure that the system responds exactly in a way that we would expect and brings the right value to the table – optimizing and providing operational efficiency. That’s why the concepts of advanced RAG and modular RAG were introduced.

Custom RAG Idea & Promise

All the challenges where the naive RAG system struggled were turned into opportunities or cornerstones of more advanced RAG systems. The idea of custom RAG is to implement a system that is designed and adjusted to a specific use case by utilizing plenty of different RAG techniques that we’ll explore in the following section of this article.

Imagine a financial institution trying to manage their repository of financial documents. Analysts seeking information on "market volatility in tech stocks 2023" may face challenges due to the vast corpus and varied terminologies used in documents. A Hybrid Search in a RAG system using both lexical and vector-based search can quickly surface documents directly mentioning the query as well as those discussing related concepts, like "fluctuations in the technology sector this year," ensuring analysts conclude informed and comprehensive assessments.

Let’s look at another example. In the pharmaceutical industry, researchers often collaborate across different organizations, and Graph RAG can be a game-changer. Organizations can use knowledge graphs that map relationships between compounds, published research, patents, and ongoing experiments. When a researcher queries the system about "potential interactions of Compound X," the system can retrieve and generate summaries of how Compound X interacts with other known compounds, ongoing studies, or potential side effects based on collective input from various sources. This supports accelerated research outcomes and collaboration, potentially leading to faster drug development timelines.

Advanced RAG Techniques

Let’s take a closer look at a subset of techniques that can leverage a RAG system and make it more effective.

Hybrid Search

This technique combines the classic lexical search capability, which means matching exact words or phrases that appear in both queries and documents, with a semantic search aimed at interpreting the meaning of words and phrases making it more robust. With this technique, RAG systems can retrieve data from the database more accurately. With better documents retrieved and passed as context to the LLM, we receive a better answer to the given query.

Hypothetical Document Embeddings (HyDE)

Another interesting technique that aims to improve the efficiency of the retrieval part of RAG is the Hypothetical Document Embedding technique. The idea is to generate a phony, fake, hypothetical document based on a given query. After one is created, its embedding is used to retrieve documents that may be similar from the same embedding space. Despite this technique sounding quite abstract, it really can improve the efficiency of the system in some cases.

Retrieval Augmented Thoughts (RAT)

The RAT approach is built on the foundations of RAG and the Chain of Thought (CoT) prompting technique. The CoT prompting technique has shown a great improvement in the reasoning capabilities of Large Language Models. By encouraging an LLM to break down complex tasks into smaller steps, it effectively stimulates a step-by-step process that humans also use while solving complex problems. 

Chain of Thought can be effectively incorporated into the RAG system to provide better results. The initial prompt to the system encourages LLM to generate a series of intermediate steps, aka “Thoughts”. Next, each step retrieves relevant information from the knowledge database connected to the system as it is normally done in RAG. The findings are revised and reviewed and finally, the response is shared with the user.

GraphRAG

This technique aims to answer the challenge where baseline RAG struggles with connecting the dots between the information stored in the knowledge database. Imagine that there are a lot of pieces of information with shared attributes scattered across the whole knowledge database. While prompting a system we would like to receive a very accurate and deep answer that analyzed all the pieces. This is exactly what GraphRAG does. It extends the RAG system in a way to understand the holistic concept of the data, by creating a knowledge graph. This structured, hierarchical approach provides substantial improvement in answering questions that may require such an approach of connecting multiple data.

Which One to Choose?

As always, there’s no silver bullet. Every technique is different and aims to solve different problems concerning the RAG system. So it’s not about pulling all of them into one huge RAG system but rather carefully picking and designing the system in a way that is the most appropriate and efficient in a given use case. 

That’s also why out-of-the-box RAG solutions fall short. In some cases, we may really need a robust RAG system that we can rely on and will bring true value to a company's operations. If that’s the case, I believe it’s better to consider a custom RAG system where you can adjust it to your needs and requirements.

If you'd like to apply Retrieval Augmented Generation to your company, be up to date, and learn about the benefits it can give you, schedule a call with us.