Using Agents to Not Use Agents: How we built our Text-to-SQL Q&A system

by: Zia Khan and Poornima Ramesh

Ask-a-Metric is a WhatsApp-based AI data analyst that uses LLMs to answer SQL database queries, facilitating data access for decision-making in the development sector (GitHub). Initially, we used a simple pipeline for rapid feedback but faced challenges in accuracy and building it for scale. We tested an agentic approach with CrewAI, improving accuracy but ending up with high costs and slow response speeds. We used these results to develop a pseudo-agent pipeline that combines the best of both approaches, reducing costs and response times while maintaining accuracy.¹

Introduction

We have been working on Ask-a-Metric, a WhatsApp-based AI data analyst that connects to your SQL database and answers your questions using Large Language Models (LLMs). We expect it to streamline access to data for decision-making at all levels in the development sector context.

On a high level, Ask-a-Metric collects a user’s question–say, “How many beds are there in hospitals in Chennai?”–analyzes the SQL database it is connected to, and returns the answer to the user.

Ask-a-Metric system overview

Figure 1²: Ask-a-Metric system overview

To answer questions using information from a database, Ask-a-Metric must overcome numerous technical challenges - it must understand the user’s question, comprehend the information present in the database, conduct accurate analysis, and synthesize all of it to return an appropriate response. It has to do everything while ensuring safety and security.

Understand the question and the user’s context.

Comprehend the data’s structure and context.

Conduct accurate analysis to answer the user’s question in a contextually correct way.

Ensure safety and security.

Figure 2: Ask-a-Metric key focuses

We developed Ask-a-Metric over two major iterations: the initial “simple version” and the current “pseudo-agent version.”

Simple pipeline

While making the simple version, we followed the philosophy of building as quickly as possible to solve the problem, not necessarily following best practices or building something that scales. This allowed us to get feedback from users and make improvements quickly. Our backend tech stack—where the Ask-a-Metric pipeline runs—is built using Python’s FastAPI framework. We organized our code following a functional programming paradigm (which is the norm in our team), meaning all pipeline operations were implemented purely through functions. Figure 3 shows the flow of the resulting simple pipeline we built.

Simple pipeline flow

Figure 3: Simple pipeline flow

The flow of the simple pipeline was intentionally “simple” because it allowed us to build fast. Building it fast meant that we could rapidly connect it to as many different databases as we had access to and check where and how we needed to focus on improving the system.

After building the simple pipeline, we came across three main challenges -

Accuracy: The pipeline resulted in LLM responses that were not accurate enough. We needed to break the problem into smaller steps and give the LLM more information to complete the task accurately.
Prompt Engineering: Minor changes in prompts to improve performance on one set of questions would reduce performance on another set, making it very difficult to improve the system as a whole.
Pipeline modification: It was difficult to modify our pipeline because our architecture was a long chain of steps executed sequentially. Every step was tightly coupled with every other step, which meant that changing a prompt could cause the entire pipeline to need modifications.

This made improving the algorithm through experimentation and trial and error too slow. There were many ways to improve the LLM’s accuracy, but most of them were not robust since they had a cascading effect on the entire pipeline. This also made trying out each new method slow.

Exploring the agentic approach

Given the challenges we were facing, one method we wanted to try was to completely eliminate a structured pipeline and offload everything to an agentic setup.

We heuristically define an AI agent as an LLM-driven system which has the following characteristics -

It remembers and learns from its past actions and observations,
Uses tools to execute actions based on its own choosing, and
Plans what tasks it wants to conduct and how.

So, the LLM agent would decide which tools to use and when, thus eliminating the need to build and manage a pipeline.

We experimented with CrewAI, an open-source multi-agent pipeline tool, to improve the simple pipeline. Again, keeping with our philosophy of building fast, we rapidly set up and tested CrewAI.

Our agentic pipeline consisted of two agents—a customer-facing manager and a data analyst. We gave each agent relevant tools to execute their tasks. To save time, we used CrewAI’s built-in tools and Langchain’s tools, which are also compatible with CrewAI. These tools included the following -

SQL DDL schema tool to let the agent read the entire database’s schema,
SQL tool to allow the agent to run any SQL query on a database, and
RAG tool to query all the column descriptions in the database from a CSV file

The agent could flexibly choose when and how to use these tools while attempting to solve a task.

CrewAI made setting up the test pipeline straightforward, simplifying the creation of agentic flows. For more information, please refer to CrewAI’s documentation here.

In our tests, the CrewAI-based agentic pipeline performed excellently. It was able to answer all questions from our test database correctly. However, the response time and cost of this version of the pipeline were prohibitively high: answers took more than a minute to be generated, and the cost per query was approximately USD 0.3. For our partners to be able to use AAM, we needed the query response time to be less than 30 seconds (two to three times faster) and the cost per query to be less than USD 0.03 (ten times cheaper).

Using the agentic approach to optimize our pipeline

The agentic pipeline gave us the desired performance, but we could not achieve the targeted response time or cost. However, it helped us better understand what tools and prompts worked best to improve response accuracy.

We could thus combine the simple pipeline’s low cost and quick response time with the agentic pipeline’s better accuracy and improved prompts. This came about through two key observations when we were running the agentic experiments -

The agentic pipeline was predictable: The agent was very consistent in the sequence of actions it took to answer every question. This involved self-prompting, reflecting on past actions, and breaking down the task into smaller steps. We realized that we could use the same prompts the agent gave itself and hard-code the same steps it used to build our own pipeline.
The pipeline is object-oriented: CrewAI’s object-oriented approach modularized and decoupled the different parts of the pipeline. Implementing a similar structure in our pipeline would help us build and modify our code more easily and quickly.

As a result, even though using agents did not prove useful in production, it was nevertheless useful for solving the issues with our pipeline.

We can thus re-frame this as an optimization problem that the agents were helping us solve. The following are the parameters we were optimizing over -

The number and complexity of the steps or actions required to complete a task: It is difficult to know a priori whether we should have the LLM solve the task in a single shot or execute the task over multiple smaller steps/prompts.
The types of tools/functions to use: In a production environment, we want to be parsimonious about using tools to achieve optimal cost and response time. Therefore, we need to build custom tools/functions that support specific actions (e.g., retrieving tables from the database schema, retrieving the most common values per column in the database, etc.). Knowing what these specific actions need to be is difficult and requires time-consuming experimentation and research.
The exact language and wording of prompts: It is not possible to know in advance which specific prompts to give to an LLM to make it correctly execute multiple tasks in the desired manner.

We can use agents to implicitly perform a “search” over the parameter space we have described above and find the best “minima,” i.e. the set of actions, tools, and prompts that results in the highest response accuracy at the lowest cost.³ In our case, though the agentic pipeline did not give us the optimal solution, it was good enough to satisfy our project requirements.

Pseudo-agent pipeline

Using the best actions, tools, and prompts from the agentic experiments, we built an updated pseudo-agent version of the pipeline.

This version uses a similar flow to what we observed the agentic pipeline use. Figure 4 illustrates the key technical aspects of the newer pipeline which helped us keep the same level of accuracy as the agentic flows but reduce cost and speed by an order of magnitude.

Break tasks into smaller steps: identify tables, find the best columns, extract sample column values, etc., each as separate steps.

Only incorporate the actions that the agentic flow executed. Eliminate self-reflection and planning API calls.

Share only the required data for each API call. Stop sharing flow logs with the LLM at each step.

Build tools with fixed scope that quickly produce the required outputs.

Figure 4: Key aspects of the pseudo-agent pipeline

Further, we refactored our original code into an object-oriented paradigm, breaking down the pipeline into different components - processing user queries, guardrails, and tools. This allows us to imitate an AI agent’s actions without bearing the high costs and slow speeds of agentic flows. It also simplifies the process of building, managing and modifying our pipeline: for example, we can now make changes to the guardrails, without affecting query processing.

The pseudo-agent flow, on average, has a response time of less than 15 seconds and costs less than USD 0.02 per query. Figure 5 illustrates the flow of this pipeline.

Simple pipeline flow

Figure 5: Pseudo-agent pipeline flow

What’s next?

The pseudo-agentic pipeline now allows us to test and make changes to the pipeline quickly. Going forward, we are working on improving our current solution along all three of our key metrics–accuracy, speed, and cost–while also building more features like multi-turn chat, easier user onboarding, multi-language support, etc. We are also testing our solution in multiple contexts and using these pilots to get feedback and increase the value our solution brings to users.

Our objective is to build a product that reduces the barriers to data use for decision-making, particularly in the social impact sector. We hope to keep our solution cheap and efficient while ensuring accuracy so that it is both accessible and useful for governments, NGOs, and other stakeholders in the sector.

If you want to know more about Ask-a-Metric, reach out to us at DSEM Team. We would love to talk to you and see how we can help and collaborate with you.

Cover Photo by Ana Municio on Unsplash. ↩︎
All Icons in this blog are made by Freepik from www.flaticon.com ↩︎
This idea is also gaining popularity in the AI community with Python libraries like DSPy which focus on prompt engineering optimization. ↩︎

« Comparing Ask-a-Metric and Vanna.ai Search is all you need... probably »