LLM Proxy Server
What is it?
AAQ uses the LiteLLM Proxy Server for managing LLM calls, allowing you to use any LiteLLM supported model including self-hosted ones.
This proxy server runs as a separate Docker container with configs read from a config.yaml
file, where you can set the appropriate model
names and endpoints for each LLM task.
Example config
You can see an example litellm_proxy_config.yaml
file below. In our backend code, we refer to the models by their custom task model_name
(e.g. "generate-response"), but
which actual LLM model each call is routed to is set here.
model_list:
- model_name: embeddings
litellm_params:
model: text-embedding-ada-002
api_key: "os.environ/OPENAI_API_KEY"
- model_name: default
litellm_params:
model: gpt-4-0125-preview
api_key: "os.environ/OPENAI_API_KEY"
- model_name: generate-response
litellm_params:
model: gpt-4-0125-preview
api_key: "os.environ/OPENAI_API_KEY"
- model_name: detect-language
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: translate
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: paraphrase
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: safety
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
- model_name: alignscore
litellm_params:
model: gpt-3.5-turbo-1106
api_key: "os.environ/OPENAI_API_KEY"
litellm_settings:
num_retries: 3
request_timeout: 100
telemetry: False
See the Contributing Setup and Docker Compose Setup for how this service is run in our stack.
Monitoring with Langfuse
You can log all inputs and outputs of LiteLLM Proxy server via Langfuse.
-
Add Langfuse to
litellm_proxy_config.yaml
litellm_settings: success_callback: ["langfuse"]
-
Include Langfuse credentials as environment variables in your deployment environment. If you are using
docker compose
, add the following in yourdeployment/docker-compose/.env
file:LANGFUSE_PUBLIC_KEY=pk-... LANGFUSE_SECRET_KEY=sk-...