Search is all you need... probably

Sid Ravinutala
Chief Data Scientist at IDinsight.

May 9, 2024 7 min read

It’s the GenAI age. Every person and their grandpa is creating AI chatbots based on RAG. For farmers, for mothers, for teachers, for bureaucrats. Hey, we’re doing it too!

But here’s a hot take: you don’t need a RAG AI chatbot. Definitely not at the start. Probably not ever.

Chatbots don’t work, systems do

Somehow this new craze of chatbots is immune to all the lessons we have learnt from building effective systems. It is a technological solutionism on steroids. I have come across interventions where a standalone chatbot was dropped on some unsuspecting beneficiary - a teacher, a farmer, a mother - and after the initial novelty, the daily active users fizzled out to a handful of users. Because information was not the binding constraint, or it was not the only binding constraint. And in isolation, it is not effective, never mind “transformative”.

The exceptions are where “chat” is a small part of a larger digital health solution. Three great examples are Reach’s MomConnect in South Africa, Jacaranda Health’s PROMPTS in Kenya, and SameSame in South Africa. What distinguishes them from this new age of “ChatBot for X” is that they have a large active user base because of the ecosystem of support and services they offer. AI, even when it is just chat, it is in service of the larger offering or an add-on feature for an already active user base.

“But Sid, people won’t use it if it’s not providing custom answers like with RAG.”

If your target audience is teenagers, fair enough. I am aware of their fleeting attention spans. But adults can deal with search results.

RAG Chatbots are being deployed in the private sector because we are trying to eke out additional value by going from “search” to “custom response”. I agree with Vicki Boykis. We’re in the “vibe” space where we think everything has to be a general chat.

Most of the use cases where RAG chatbots are being deployed in the dev sector don’t even have search. I’m positing that 80% of the value can be gained by simply rolling out search.

And it’s nontrivial and expensive to do RAG safely (more on that in another post).

But is it cost-effective?

This is how tech used to work: you pay a significant fixed cost upfront, and then the marginal cost of adding a new user was close to zero.

When building a GenAI model, your fixed costs are still high. You still have to build software, put together a validation set, experiment with prompts and guardrails, and sort out infrastructure and deployment.

But now your marginal costs are also high. For every new message, you are running a series of steps. One example pipeline might be safety checks, translation, safety checks again, paraphrasing, answer generation, and entailment. And each of those might be calling an LLM. The costs add up quickly even with token costs being low.

“But Sid, we have hosted our own Llama 3 model instance”

The cost of hosting a 70B LLama 3 is not small. I am going to guess that you don’t have the volumes needed yet for this to make sense. And this is still a variable cost - you are forking out thousands of dollars per month before you have figured out if this is having the positive impact that you hoped.

“But Sid, we have all these free credits”

You lucky dog. But those will run out eventually and depending on them is not a sustainable solution. You should be using them for innovation and experimentation. Not for scaling your chatbot.

Once the hype cycle is over and we recognize that rolling out a chatbot in a vacuum doesn’t work, this hyper-subsidization will be over and we will have to confront the questions the rest of the development sector has always been grappling with - is this cost-effective?

So what’s next?

If you are convinced that information is the barrier to solve first, then start with simple semantic search. Pay the upfront cost of curating your content, instead of dropping a kitchen sink of documents. Use an embeddings model to match the question from your beneficiary to this database of content.

The user experience is like google search. It’s not as sweet as asking ChatGPT but a 2 trillion dollar market cap says it’s not terrible.

Here’s what you get if you start with search:

You roll out your question-answering service for 10% of your budget. The algorithm is comparatively simple and open-source embedding models often work as well (better, in our experiments) than OpenAI ones. The variable cost is very small and scales sub-linearly with the number of conversations.
You get to seek feedback on each of your content pieces, and content experts (not engineers) can use that information to improve it. No RLHF, no DPO needed.
You don’t have to stress about fine tuning, hallucinations, jailbreaks, prompt-injections, system bias. You’ll save a lot of dev time.
Keep your user data on your infrastructure.
You get to curate your content and create new content based on what your user are searching.

And, this is what you lose:

Time from content managers to curate content.
A customized response.

You can still do things like provide multimedia content, personalize to user demographics, ask users to refine their question, and serve multilingual content. And don’t worry, you still get to say you are doing AI.

Use the time and money spent instead on monitoring and feedback. What kinds of questions are people asking? What content is least popular or gets the worst feedback? What do your beneficiaries really want? Who are the people engaging with the system? Why them?

Ok. Maybe you do need a RAG chatbot

If after rolling out your semantic search, you are finding that engagement is low, or feedback is not great. Or the impact you thought you would have is absent. You do user interviews and discover that it is not because you misinterpreted their needs. And it’s not because telling them they need “Acme Fertilizer 2000” doesn’t actually get them the fertilizer - you still need a market and the financial means.

It is indeed because they are not getting customized answers. Or your use cases is one where a chat is actually the offering. It’s not information but a conversation that you are providing - a friendly ear or wise counsel. Or it is because there is a huge variation in content being requested and your content curation can’t service all of them. Your beneficiaries are really asking for what’s on page 75 of that dense 200 page document. Now is the time to switch to a RAG solution and pay the premium.

The MomConnect Ask-a-question example

Version 1.0 of MomConnect Ask-a-question that is currently in production is answering over 40k questions a month and costs nearly nothing per user. Our validation showed 60% accuracy (and the rest are escalated to the helpdesk) and that is what we see in production. And this was built before the GenAI era using custom word embeddings, not even text embeddings.

We are now using better embeddings and accuracy is nearly 90% on our validation set. Though Ask-a-question has RAG and custom LLM generated responses to questions, this is not what we will be rolling out in the next release. It will be an embeddings search with some bells and whistles. And a lot of analytics to understand the users and their needs better.

And if there is truly an impact case to be made for upgrading to RAG, it will be as simple as swapping out a handful of endpoints.

I am still very bullish on the potential of AI, especially LLMs to transform the sector. There are a number of very promising use cases like providing student and teacher feedback, co-pilots and AI assistants for various high skilled jobs, connecting citizens with benefits, or natural interfaces to data systems for decision making. But you probably don’t need chat completion for question-answering.

Please disabuse me if you think I’m wrong or missing an important perspective. As opinionated as this post sounds, I am open to changing my mind. In fact, as someone building AI tools, my job depends on it. You can contact me on sid.ravinutala@idinsight.org.

Thank you for reading.

« Using Agents to Not Use Agents: How we built our Text-to-SQL Q&A system Clustering algorithms for grid-based sampling »