Advanced RAG
If you're following the LLM scene, you're likely familiar with the basic RAG schema: chunking your text, creating a vector store, and performing retrieval on that (if not, check out our RAG intro). While this approach generates approximately 80% of the value and facilitates the creation of proofs of concept (PoCs), the final product may require a more complex solution. That's where advanced RAG comes into play! We've devoted considerable time to researching RAG improvement techniques and aim to share insights from our journey in this series. This blog post focuses on query expansion, a technique that enhances the user's question behind the scenes, potentially leading to more relevant retrieved chunks. We will explore two variations of query expansion: Hypothetical Answer and Multi-Query.
Hypothetical answer
Improving LLM answers by asking another LLM to hallucinate? Sounds convoluted but it works! The process involves taking a user's question and asking an LLM to generate a hypothetical answer. This answer is then vectorized (turned into a vector representation) and included in the retrieval process alongside the original query. The hypothetical answer, rich in relevant terms and sentences, enhances the retrieval process's efficacy. While some discrepancies in hard facts and numbers might occur, they generally do not hinder the retrieval phase.
Let’s imagine a use case - we have created a RAG system on top of Microsoft annual report (example taken from source). We may ask question:
"Was there significant turnover in the executive team?"
We ask an LLM to generate hypothetical answer and we might get something like this:
In the past year, there was minimal turnover in the executive team. The leadership continuity has provided stability and allowed for the implementation of long-term strategic plans without interruptions. The consistent senior management team has demonstrated a commitment to the company's growth and success.
This answer is completely made up and has nothing to do with Microsoft annual report. We then embed this answer together with the question and do retrieval based on that!
Multi query
The multi-query technique involves taking a user query and generating 'N' similar queries using an LLM. Each of these queries, including the original, is then vectorized and subjected to separate retrieval processes, leading to a potentially higher volume of relevant chunks. Due to the increased quantity of retrieved information, a reranker can be employed. Rerankers use machine learning models to determine the most relevant chunks among those retrieved.
Let’s continue with the Microsoft annual report use case. Our question for this example can be about revenue:
"What were the most important factors that contributed to increases in revenue?"
We ask an LLM to generate similar questions:
- What were the company's main sources of revenue for the year?
- How did changes in market conditions impact the company's revenue growth?
- Were there any new product launches or acquisitions that drove revenue growth?
- What pricing strategies were implemented to increase revenue?
- How did changes in customer demand affect revenue generation?
And then we embed each of these questions (plus the original one) and run retrieval.
Conclusion
In conclusion, advanced RAG techniques, such as Hypothetical Answer and Multi-Query, offer promising avenues for enhancing the performance of language models by facilitating more relevant and accurate information retrieval. By leveraging these sophisticated methods, we can push the boundaries of what's possible with LLMs, leading to more precise and useful responses to complex queries. Stay tuned for more advanced RAG techniques!