
When Retrieval Augmentation Generation Doesn’t Quite Cut It
One approach to answering user inquiries with LLMs and reference material is called Retrieval Augmentation Generation, RAG for short. While RAG can serve in a multitude of question/answering scenarios, it can fall short in certain situations.
A common challenge in conversation-based applications that leverages Large Language Models is constraining responses to a valid set of options that have been curated. This helps also addressing some of the problems caused by model hallucinations.
In order to overcome this in the personal project I’m working on, I had to come up with a way to make the output of the process more reliable and predictable.
Process Overview
Here’s an high level overview of the current version of the process. With each iteration of the product, this is getting fine tuned and made even more reliable.

Let’s break down the key steps in the process of using this API:
1. Split User Input into Separate Tasks
The conversation begins with the user input, which may contain multiple tasks or questions. The API first parses and separates these tasks to address them individually.
2. Generate User Task Embedding
For each task, the API generates a task embedding—a condensed representation of the user’s intent and context. This embedding is crucial for the subsequent steps.
3. Query Domain Option Types For Most Similar Embeddings
To ensure valid responses, the API queries domain-specific option types for the most similar embeddings to the user’s task. This step narrows down the potential responses to options within the relevant domain.
4. Query Domain Option Type Values For Most Similar Embeddings
Building on the previous step, the API further refines the options by querying domain-specific option type values for the most similar embeddings. This step ensures that the responses align with the specifics of the user’s inquiry.
5. Craft LLM Prompt using Top K results from previous queries
With the options and values identified, the API crafts a Language Model (LLM) prompt using the top results from the previous queries. This prompt serves as the basis for generating an answer to the original user inquiry. This essentially allows the model to have only the right information in hand to generate a relevant answer to the user input.
6. Execute LLM Prompt with Function Calling
To obtain a structured response in JSON format, the API executes the LLM prompt using function calling. This ensures that the response aligns with the predefined options and values.
7. Parse LLM Output
The API parses the output generated by the Language Model, extracting structured information in JSON format. Having structured data is important as this structured data can now be used more safely for other downstream tasks in the conversation process.
8. Respond to User Inquiry
Finally, armed with the structured response, the API provides a tailored and accurate reply to the user’s inquiry. This response is not only relevant but also constrained within the valid set of options, ensuring a seamless and satisfying user experience.
Sample Use Case: Chatbot Requirements Capture
Let’s explore a scenario involving a chatbot designed to capture user requirements such as the one I’ve been working on in my personal project, illustrating the versatility of this API. In this context, user requirements can take on different types, such as specifying the brand of an item, which may only apply to certain item categories.
Consider the following aspects:
- Diverse Requirement Types: User requirements can vary widely, including attributes like the brand of an item. However, these requirements are not universal and may be relevant only to specific types of items. For instance, when a user expresses a brand-related requirement, it’s essential to recognize the context in which this requirement applies.
- Constraining Valid Options: Once the chatbot understands the user’s intent, especially when it pertains to a requirement of a certain type like “brand,” it may need to further refine the options. For example, if the user specifies a brand requirement, the chatbot should constrain the valid options to align with that specific requirement. This could involve filtering values to only include what’s relevant to a particular category.
- Adapting to User Input: User input can sometime be ambiguous. Sometimes, a user might state their requirement by only mentioning the brand name. In such cases, the chatbot needs to reverse its logic, starting by identifying the specific option value embedding to understand that the user is referring to a particular brand. This adaptability is crucial, especially when dealing with complex user inputs that may not align with the chatbot’s pre-trained knowledge. For example, if the user simply states “I like Ford”, you need to have some pre-existing context to determine they are talking about Ford as a car brand and not Tom Ford or the Ford modeling agency.
Here’s an example of what it looks like in action in the product I’m working on:

In summary, the chatbot, powered by this API, excels in handling nuanced user requirements, even when they involve specific types like brands. It adapts its responses based on user input, ensuring that the conversation remains coherent and responsive, regardless of how users express their needs. This dynamic approach enables the chatbot to effectively capture and validate user requirements, providing structured and precise information in return.
In Conclusion
The API we’ve developed represents a significant leap forward in the world of chatbot and conversation-based applications. By streamlining the process of constraining responses to valid options, it enables chatbots to provide more accurate and relevant information to users. Whether you’re building a customer support chatbot or a virtual assistant, this API can enhance the user experience by ensuring that responses are not only informative but also within the desired scope.
As the world of natural language processing continues to evolve, tools like this API empower developers to create more sophisticated and efficient conversational interfaces. With the ability to handle complex tasks and constraints, chatbots built using this technology are poised to deliver a new level of user satisfaction and efficiency.
If that’s something of interest and would like to discuss further how this new API can be used in your scenario, feel free to reach out to me.