litellm/docs/my-website/docs/proxy/configs.md
Ishaan Jaff 1890fde3f3
(Proxy) add support for DOCS_URL and REDOC_URL (#6806)
* add support for DOCS_URL and REDOC_URL

* document env vars

* add unit tests for docs url and redocs url
2024-11-19 07:02:12 -08:00

56 KiB
Raw Blame History

import Image from '@theme/IdealImage'; import Tabs from '@theme/Tabs'; import TabItem from '@theme/TabItem';

Proxy Config.yaml

Set model list, api_base, api_key, temperature & proxy server settings (master-key) on the config.yaml.

Param Name Description
model_list List of supported models on the server, with model-specific configs
router_settings litellm Router settings, example routing_strategy="least-busy" see all
litellm_settings litellm Module settings, example litellm.drop_params=True, litellm.set_verbose=True, litellm.api_base, litellm.cache see all
general_settings Server settings, example setting master_key: sk-my_special_key
environment_variables Environment Variables example, REDIS_HOST, REDIS_PORT

Complete List: Check the Swagger UI docs on <your-proxy-url>/#/config.yaml (e.g. http://0.0.0.0:4000/#/config.yaml), for everything you can pass in the config.yaml.

Quick Start

Set a model alias for your deployments.

In the config.yaml the model_name parameter is the user-facing name to use for your deployment.

In the config below:

  • model_name: the name to pass TO litellm from the external client
  • litellm_params.model: the model string passed to the litellm.completion() function

E.g.:

  • model=vllm-models will route to openai/facebook/opt-125m.
  • model=gpt-3.5-turbo will load balance between azure/gpt-turbo-small-eu and azure/gpt-turbo-small-ca
model_list:
  - model_name: gpt-3.5-turbo ### RECEIVED MODEL NAME ###
    litellm_params: # all params accepted by litellm.completion() - https://docs.litellm.ai/docs/completion/input
      model: azure/gpt-turbo-small-eu ### MODEL NAME sent to `litellm.completion()` ###
      api_base: https://my-endpoint-europe-berri-992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_EU" # does os.getenv("AZURE_API_KEY_EU")
      rpm: 6      # [OPTIONAL] Rate limit for this deployment: in requests per minute (rpm)
  - model_name: bedrock-claude-v1 
    litellm_params:
      model: bedrock/anthropic.claude-instant-v1
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: azure/gpt-turbo-small-ca
      api_base: https://my-endpoint-canada-berri992.openai.azure.com/
      api_key: "os.environ/AZURE_API_KEY_CA"
      rpm: 6
  - model_name: anthropic-claude
    litellm_params: 
      model: bedrock/anthropic.claude-instant-v1
      ### [OPTIONAL] SET AWS REGION ###
      aws_region_name: us-east-1
  - model_name: vllm-models
    litellm_params:
      model: openai/facebook/opt-125m # the `openai/` prefix tells litellm it's openai compatible
      api_base: http://0.0.0.0:4000/v1
      api_key: none
      rpm: 1440
    model_info: 
      version: 2
  
  # Use this if you want to make requests to `claude-3-haiku-20240307`,`claude-3-opus-20240229`,`claude-2.1` without defining them on the config.yaml
  # Default models
  # Works for ALL Providers and needs the default provider credentials in .env
  - model_name: "*" 
    litellm_params:
      model: "*"

litellm_settings: # module level litellm settings - https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py
  drop_params: True
  success_callback: ["langfuse"] # OPTIONAL - if you want to start sending LLM Logs to Langfuse. Make sure to set `LANGFUSE_PUBLIC_KEY` and `LANGFUSE_SECRET_KEY` in your env

general_settings: 
  master_key: sk-1234 # [OPTIONAL] Only use this if you to require all calls to contain this key (Authorization: Bearer sk-1234)
  alerting: ["slack"] # [OPTIONAL] If you want Slack Alerts for Hanging LLM requests, Slow llm responses, Budget Alerts. Make sure to set `SLACK_WEBHOOK_URL` in your env

:::info

For more provider-specific info, go here

:::

Step 2: Start Proxy with config

$ litellm --config /path/to/config.yaml

:::tip

Run with --detailed_debug if you need detailed debug logs

$ litellm --config /path/to/config.yaml --detailed_debug

:::

Step 3: Test it

Sends request to model where model_name=gpt-3.5-turbo on config.yaml.

If multiple with model_name=gpt-3.5-turbo does Load Balancing

Langchain, OpenAI SDK Usage Examples

curl --location 'http://0.0.0.0:4000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "gpt-3.5-turbo",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

LLM configs model_list

Model-specific params (API Base, Keys, Temperature, Max Tokens, Organization, Headers etc.)

You can use the config to save model-specific information like api_base, api_key, temperature, max_tokens, etc.

All input params

Step 1: Create a config.yaml file

model_list:
  - model_name: gpt-4-team1
    litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      azure_ad_token: eyJ0eXAiOiJ
      seed: 12
      max_tokens: 20
  - model_name: gpt-4-team2
    litellm_params:
      model: azure/gpt-4
      api_key: sk-123
      api_base: https://openai-gpt-4-test-v-2.openai.azure.com/
      temperature: 0.2
  - model_name: openai-gpt-3.5
    litellm_params:
      model: openai/gpt-3.5-turbo
      extra_headers: {"AI-Resource Group": "ishaan-resource"}
      api_key: sk-123
      organization: org-ikDc4ex8NB
      temperature: 0.2
  - model_name: mistral-7b
    litellm_params:
      model: ollama/mistral
      api_base: your_ollama_api_base

Step 2: Start server with config

$ litellm --config /path/to/config.yaml

Expected Logs:

Look for this line in your console logs to confirm the config.yaml was loaded in correctly.

LiteLLM: Proxy initialized with Config, Set models:

Embedding Models - Use Sagemaker, Bedrock, Azure, OpenAI, XInference

See supported Embedding Providers & Models here

model_list:
  - model_name: bedrock-cohere
    litellm_params:
      model: "bedrock/cohere.command-text-v14"
      aws_region_name: "us-west-2"
  - model_name: bedrock-cohere
    litellm_params:
      model: "bedrock/cohere.command-text-v14"
      aws_region_name: "us-east-2"
  - model_name: bedrock-cohere
    litellm_params:
      model: "bedrock/cohere.command-text-v14"
      aws_region_name: "us-east-1"

Here's how to route between GPT-J embedding (sagemaker endpoint), Amazon Titan embedding (Bedrock) and Azure OpenAI embedding on the proxy server:

model_list:
  - model_name: sagemaker-embeddings
    litellm_params: 
      model: "sagemaker/berri-benchmarking-gpt-j-6b-fp16"
  - model_name: amazon-embeddings
    litellm_params:
      model: "bedrock/amazon.titan-embed-text-v1"
  - model_name: azure-embeddings
    litellm_params: 
      model: "azure/azure-embedding-model"
      api_base: "os.environ/AZURE_API_BASE" # os.getenv("AZURE_API_BASE")
      api_key: "os.environ/AZURE_API_KEY" # os.getenv("AZURE_API_KEY")
      api_version: "2023-07-01-preview"

general_settings:
  master_key: sk-1234 # [OPTIONAL] if set all calls to proxy will require either this key or a valid generated token
LiteLLM Proxy supports all Feature-Extraction Embedding models.
model_list:
  - model_name: deployed-codebert-base
    litellm_params: 
      # send request to deployed hugging face inference endpoint
      model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
      api_key: hf_LdS                            # api key for hugging face inference endpoint
      api_base: https://uysneno1wv2wd4lw.us-east-1.aws.endpoints.huggingface.cloud # your hf inference endpoint 
  - model_name: codebert-base
    litellm_params: 
      # no api_base set, sends request to hugging face free inference api https://api-inference.huggingface.co/models/
      model: huggingface/microsoft/codebert-base # add huggingface prefix so it routes to hugging face
      api_key: hf_LdS                            # api key for hugging face                     

model_list:
  - model_name: azure-embedding-model # model group
    litellm_params:
      model: azure/azure-embedding-model # model name for litellm.embedding(model=azure/azure-embedding-model) call
      api_base: your-azure-api-base
      api_key: your-api-key
      api_version: 2023-07-01-preview
model_list:
- model_name: text-embedding-ada-002 # model group
  litellm_params:
    model: text-embedding-ada-002 # model name for litellm.embedding(model=text-embedding-ada-002) 
    api_key: your-api-key-1
- model_name: text-embedding-ada-002 
  litellm_params:
    model: text-embedding-ada-002
    api_key: your-api-key-2

https://docs.litellm.ai/docs/providers/xinference

Note add xinference/ prefix to litellm_params: model so litellm knows to route to OpenAI

model_list:
- model_name: embedding-model  # model group
  litellm_params:
    model: xinference/bge-base-en   # model name for litellm.embedding(model=xinference/bge-base-en) 
    api_base: http://0.0.0.0:9997/v1

Use this for calling /embedding endpoints on OpenAI Compatible Servers.

Note add openai/ prefix to litellm_params: model so litellm knows to route to OpenAI

model_list:
- model_name: text-embedding-ada-002  # model group
  litellm_params:
    model: openai/<your-model-name>   # model name for litellm.embedding(model=text-embedding-ada-002) 
    api_base: <model-api-base>

Start Proxy

litellm --config config.yaml

Make Request

Sends Request to bedrock-cohere

curl --location 'http://0.0.0.0:4000/chat/completions' \
  --header 'Content-Type: application/json' \
  --data ' {
  "model": "bedrock-cohere",
  "messages": [
      {
      "role": "user",
      "content": "gm"
      }
  ]
}'

Multiple OpenAI Organizations

Add all openai models across all OpenAI organizations with just 1 model definition

  - model_name: *
    litellm_params:
      model: openai/*
      api_key: os.environ/OPENAI_API_KEY
      organization:
       - org-1 
       - org-2 
       - org-3

LiteLLM will automatically create separate deployments for each org.

Confirm this via

curl --location 'http://0.0.0.0:4000/v1/model/info' \
--header 'Authorization: Bearer ${LITELLM_KEY}' \
--data ''

Provider specific wildcard routing

Proxy all models from a provider

Use this if you want to proxy all models from a specific provider without defining them on the config.yaml

Step 1 - define provider specific routing on config.yaml

model_list:
  # provider specific wildcard routing
  - model_name: "anthropic/*"
    litellm_params:
      model: "anthropic/*"
      api_key: os.environ/ANTHROPIC_API_KEY
  - model_name: "groq/*"
    litellm_params:
      model: "groq/*"
      api_key: os.environ/GROQ_API_KEY
  - model_name: "fo::*:static::*" # all requests matching this pattern will be routed to this deployment, example: model="fo::hi::static::hi" will be routed to deployment: "openai/fo::*:static::*"
    litellm_params:
      model: "openai/fo::*:static::*"
      api_key: os.environ/OPENAI_API_KEY

Step 2 - Run litellm proxy

$ litellm --config /path/to/config.yaml

Step 3 Test it

Test with anthropic/ - all models with anthropic/ prefix will get routed to anthropic/*

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "anthropic/claude-3-sonnet-20240229",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Test with groq/ - all models with groq/ prefix will get routed to groq/*

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "groq/llama3-8b-8192",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Test with fo::*::static::* - all requests matching this pattern will be routed to openai/fo::*:static::*

curl http://localhost:4000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "fo::hi::static::hi",
    "messages": [
      {"role": "user", "content": "Hello, Claude!"}
    ]
  }'

Load Balancing

:::info For more on this, go to this page :::

Use this to call multiple instances of the same model and configure things like routing strategy.

For optimal performance:

  • Set tpm/rpm per model deployment. Weighted picks are then based on the established tpm/rpm.
  • Select your optimal routing strategy in router_settings:routing_strategy.

LiteLLM supports

["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"`

When tpm/rpm is set + routing_strategy==simple-shuffle litellm will use a weighted pick based on set tpm/rpm. In our load tests setting tpm/rpm for all deployments + routing_strategy==simple-shuffle maximized throughput

  • When using multiple LiteLLM Servers / Kubernetes set redis settings router_settings:redis_host etc
model_list:
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8001
        rpm: 60      # Optional[int]: When rpm/tpm set - litellm uses weighted pick for load balancing. rpm = Rate limit for this deployment: in requests per minute (rpm).
        tpm: 1000   # Optional[int]: tpm = Tokens Per Minute 
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8002
        rpm: 600      
  - model_name: zephyr-beta
    litellm_params:
        model: huggingface/HuggingFaceH4/zephyr-7b-beta
        api_base: http://0.0.0.0:8003
        rpm: 60000      
  - model_name: gpt-3.5-turbo
    litellm_params:
        model: gpt-3.5-turbo
        api_key: <my-openai-key>
        rpm: 200      
  - model_name: gpt-3.5-turbo-16k
    litellm_params:
        model: gpt-3.5-turbo-16k
        api_key: <my-openai-key>
        rpm: 100      

litellm_settings:
  num_retries: 3 # retry call 3 times on each model_name (e.g. zephyr-beta)
  request_timeout: 10 # raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
  fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo"]}] # fallback to gpt-3.5-turbo if call fails num_retries 
  context_window_fallbacks: [{"zephyr-beta": ["gpt-3.5-turbo-16k"]}, {"gpt-3.5-turbo": ["gpt-3.5-turbo-16k"]}] # fallback to gpt-3.5-turbo-16k if context window error
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 

router_settings: # router_settings are optional
  routing_strategy: simple-shuffle # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
  model_group_alias: {"gpt-4": "gpt-3.5-turbo"} # all requests with `gpt-4` will be routed to models with `gpt-3.5-turbo`
  num_retries: 2
  timeout: 30                                  # 30 seconds
  redis_host: <your redis host>                # set this when using multiple litellm proxy deployments, load balancing state stored in redis
  redis_password: <your redis password>
  redis_port: 1992

You can view your cost once you set up Virtual keys or custom_callbacks

Load API Keys / config values from Environment

If you have secrets saved in your environment, and don't want to expose them in the config.yaml, here's how to load model-specific keys from the environment. This works for ANY value on the config.yaml

os.environ/<YOUR-ENV-VAR> # runs os.getenv("YOUR-ENV-VAR")
model_list:
  - model_name: gpt-4-team1
    litellm_params: # params for litellm.completion() - https://docs.litellm.ai/docs/completion/input#input---request-body
      model: azure/chatgpt-v-2
      api_base: https://openai-gpt-4-test-v-1.openai.azure.com/
      api_version: "2023-05-15"
      api_key: os.environ/AZURE_NORTH_AMERICA_API_KEY # 👈 KEY CHANGE

See Code

s/o to @David Manouchehri for helping with this.

Load API Keys from Secret Managers (Azure Vault, etc)

Using Secret Managers with LiteLLM Proxy

Set Supported Environments for a model - production, staging, development

Use this if you want to control which model is exposed on a specific litellm environment

Supported Environments:

  • production
  • staging
  • development
  1. Set LITELLM_ENVIRONMENT="<environment>" in your environment. Can be one of production, staging or development

  2. For each model set the list of supported environments in model_info.supported_environments

model_list:
 - model_name: gpt-3.5-turbo
   litellm_params:
     model: openai/gpt-3.5-turbo
     api_key: os.environ/OPENAI_API_KEY
   model_info:
     supported_environments: ["development", "production", "staging"]
 - model_name: gpt-4
   litellm_params:
     model: openai/gpt-4
     api_key: os.environ/OPENAI_API_KEY
   model_info:
     supported_environments: ["production", "staging"]
 - model_name: gpt-4o
   litellm_params:
     model: openai/gpt-4o
     api_key: os.environ/OPENAI_API_KEY
   model_info:
     supported_environments: ["production"]

Set Custom Prompt Templates

LiteLLM by default checks if a model has a prompt template and applies it (e.g. if a huggingface model has a saved chat template in it's tokenizer_config.json). However, you can also set a custom prompt template on your proxy in the config.yaml:

Step 1: Save your prompt template in a config.yaml

# Model-specific parameters
model_list:
  - model_name: mistral-7b # model alias
    litellm_params: # actual params for litellm.completion()
      model: "huggingface/mistralai/Mistral-7B-Instruct-v0.1" 
      api_base: "<your-api-base>"
      api_key: "<your-api-key>" # [OPTIONAL] for hf inference endpoints
      initial_prompt_value: "\n"
      roles: {"system":{"pre_message":"<|im_start|>system\n", "post_message":"<|im_end|>"}, "assistant":{"pre_message":"<|im_start|>assistant\n","post_message":"<|im_end|>"}, "user":{"pre_message":"<|im_start|>user\n","post_message":"<|im_end|>"}}
      final_prompt_value: "\n"
      bos_token: " "
      eos_token: " "
      max_tokens: 4096

Step 2: Start server with config

$ litellm --config /path/to/config.yaml

General Settings general_settings (DB Connection, etc)

Configure DB Pool Limits + Connection Timeouts

general_settings: 
  database_connection_pool_limit: 100 # sets connection pool for prisma client to postgres db at 100
  database_connection_timeout: 60 # sets a 60s timeout for any connection call to the db 

All settings

environment_variables: {}

model_list:
  - model_name: string
    litellm_params: {}
    model_info:
      id: string
      mode: embedding
      input_cost_per_token: 0
      output_cost_per_token: 0
      max_tokens: 2048
      base_model: gpt-4-1106-preview
      additionalProp1: {}

litellm_settings:
  # Logging/Callback settings
  success_callback: ["langfuse"]  # list of success callbacks
  failure_callback: ["sentry"]  # list of failure callbacks
  callbacks: ["otel"]  # list of callbacks - runs on success and failure
  service_callbacks: ["datadog", "prometheus"]  # logs redis, postgres failures on datadog, prometheus
  turn_off_message_logging: boolean  # prevent the messages and responses from being logged to on your callbacks, but request metadata will still be logged.
  redact_user_api_key_info: boolean  # Redact information about the user api key (hashed token, user_id, team id, etc.), from logs. Currently supported for Langfuse, OpenTelemetry, Logfire, ArizeAI logging.
  langfuse_default_tags: ["cache_hit", "cache_key", "proxy_base_url", "user_api_key_alias", "user_api_key_user_id", "user_api_key_user_email", "user_api_key_team_alias", "semantic-similarity", "proxy_base_url"] # default tags for Langfuse Logging
  
  # Networking settings
  request_timeout: 10 # (int) llm requesttimeout in seconds. Raise Timeout error if call takes longer than 10s. Sets litellm.request_timeout 
  force_ipv4: boolean # If true, litellm will force ipv4 for all LLM requests. Some users have seen httpx ConnectionError when using ipv6 + Anthropic API
  
  set_verbose: boolean # sets litellm.set_verbose=True to view verbose debug logs. DO NOT LEAVE THIS ON IN PRODUCTION
  json_logs: boolean # if true, logs will be in json format

  # Fallbacks, reliability
  default_fallbacks: ["claude-opus"] # set default_fallbacks, in case a specific model group is misconfigured / bad.
  content_policy_fallbacks: [{"gpt-3.5-turbo-small": ["claude-opus"]}] # fallbacks for ContentPolicyErrors
  context_window_fallbacks: [{"gpt-3.5-turbo-small": ["gpt-3.5-turbo-large", "claude-opus"]}] # fallbacks for ContextWindowExceededErrors



  # Caching settings
  cache: true 
  cache_params:        # set cache params for redis
    type: redis        # type of cache to initialize

    # Optional - Redis Settings
    host: "localhost"  # The host address for the Redis cache. Required if type is "redis".
    port: 6379  # The port number for the Redis cache. Required if type is "redis".
    password: "your_password"  # The password for the Redis cache. Required if type is "redis".
    namespace: "litellm.caching.caching" # namespace for redis cache
  
    # Optional - Redis Cluster Settings
    redis_startup_nodes: [{"host": "127.0.0.1", "port": "7001"}] 

    # Optional - Redis Sentinel Settings
    service_name: "mymaster"
    sentinel_nodes: [["localhost", 26379]]

    # Optional - Qdrant Semantic Cache Settings
    qdrant_semantic_cache_embedding_model: openai-embedding # the model should be defined on the model_list
    qdrant_collection_name: test_collection
    qdrant_quantization_config: binary
    similarity_threshold: 0.8   # similarity threshold for semantic cache

    # Optional - S3 Cache Settings
    s3_bucket_name: cache-bucket-litellm   # AWS Bucket Name for S3
    s3_region_name: us-west-2              # AWS Region Name for S3
    s3_aws_access_key_id: os.environ/AWS_ACCESS_KEY_ID  # us os.environ/<variable name> to pass environment variables. This is AWS Access Key ID for S3
    s3_aws_secret_access_key: os.environ/AWS_SECRET_ACCESS_KEY  # AWS Secret Access Key for S3
    s3_endpoint_url: https://s3.amazonaws.com  # [OPTIONAL] S3 endpoint URL, if you want to use Backblaze/cloudflare s3 bucket

    # Common Cache settings
    # Optional - Supported call types for caching
    supported_call_types: ["acompletion", "atext_completion", "aembedding", "atranscription"]
                          # /chat/completions, /completions, /embeddings, /audio/transcriptions
    mode: default_off # if default_off, you need to opt in to caching on a per call basis
    ttl: 600 # ttl for caching


callback_settings:
  otel:
    message_logging: boolean  # OTEL logging callback specific settings

general_settings:
  completion_model: string
  disable_spend_logs: boolean  # turn off writing each transaction to the db
  disable_master_key_return: boolean  # turn off returning master key on UI (checked on '/user/info' endpoint)
  disable_retry_on_max_parallel_request_limit_error: boolean  # turn off retries when max parallel request limit is reached
  disable_reset_budget: boolean  # turn off reset budget scheduled task
  disable_adding_master_key_hash_to_db: boolean  # turn off storing master key hash in db, for spend tracking
  enable_jwt_auth: boolean  # allow proxy admin to auth in via jwt tokens with 'litellm_proxy_admin' in claims
  enforce_user_param: boolean  # requires all openai endpoint requests to have a 'user' param
  allowed_routes: ["route1", "route2"]  # list of allowed proxy API routes - a user can access. (currently JWT-Auth only)
  key_management_system: google_kms  # either google_kms or azure_kms
  master_key: string

  # Database Settings
  database_url: string
  database_connection_pool_limit: 0  # default 100
  database_connection_timeout: 0  # default 60s
  allow_requests_on_db_unavailable: boolean  # if true, will allow requests that can not connect to the DB to verify Virtual Key to still work 

  custom_auth: string
  max_parallel_requests: 0  # the max parallel requests allowed per deployment 
  global_max_parallel_requests: 0  # the max parallel requests allowed on the proxy all up 
  infer_model_from_keys: true
  background_health_checks: true
  health_check_interval: 300
  alerting: ["slack", "email"]
  alerting_threshold: 0
  use_client_credentials_pass_through_routes: boolean  # use client credentials for all pass through routes like "/vertex-ai", /bedrock/. When this is True Virtual Key auth will not be applied on these endpoints

litellm_settings - Reference

Name Type Description
success_callback array of strings List of success callbacks. Doc Proxy logging callbacks, Doc Metrics
failure_callback array of strings List of failure callbacks Doc Proxy logging callbacks, Doc Metrics
callbacks array of strings List of callbacks - runs on success and failure Doc Proxy logging callbacks, Doc Metrics
service_callbacks array of strings System health monitoring - Logs redis, postgres failures on specified services (e.g. datadog, prometheus) Doc Metrics
turn_off_message_logging boolean If true, prevents messages and responses from being logged to callbacks, but request metadata will still be logged Proxy Logging
modify_params boolean If true, allows modifying the parameters of the request before it is sent to the LLM provider
enable_preview_features boolean If true, enables preview features - e.g. Azure O1 Models with streaming support.
redact_user_api_key_info boolean If true, redacts information about the user api key from logs Proxy Logging
langfuse_default_tags array of strings Default tags for Langfuse Logging. Use this if you want to control which LiteLLM-specific fields are logged as tags by the LiteLLM proxy. By default LiteLLM Proxy logs no LiteLLM-specific fields as tags. Further docs
set_verbose boolean If true, sets litellm.set_verbose=True to view verbose debug logs. DO NOT LEAVE THIS ON IN PRODUCTION
json_logs boolean If true, logs will be in json format. If you need to store the logs as JSON, just set the litellm.json_logs = True. We currently just log the raw POST request from litellm as a JSON Further docs
default_fallbacks array of strings List of fallback models to use if a specific model group is misconfigured / bad. Further docs
request_timeout integer The timeout for requests in seconds. If not set, the default value is 6000 seconds. For reference OpenAI Python SDK defaults to 600 seconds.
force_ipv4 boolean If true, litellm will force ipv4 for all LLM requests. Some users have seen httpx ConnectionError when using ipv6 + Anthropic API
content_policy_fallbacks array of objects Fallbacks to use when a ContentPolicyViolationError is encountered. Further docs
context_window_fallbacks array of objects Fallbacks to use when a ContextWindowExceededError is encountered. Further docs
cache boolean If true, enables caching. Further docs
cache_params object Parameters for the cache. Further docs
cache_params.type string The type of cache to initialize. Can be one of ["local", "redis", "redis-semantic", "s3", "disk", "qdrant-semantic"]. Defaults to "redis". Furher docs
cache_params.host string The host address for the Redis cache. Required if type is "redis".
cache_params.port integer The port number for the Redis cache. Required if type is "redis".
cache_params.password string The password for the Redis cache. Required if type is "redis".
cache_params.namespace string The namespace for the Redis cache.
cache_params.redis_startup_nodes array of objects Redis Cluster Settings. Further docs
cache_params.service_name string Redis Sentinel Settings. Further docs
cache_params.sentinel_nodes array of arrays Redis Sentinel Settings. Further docs
cache_params.ttl integer The time (in seconds) to store entries in cache.
cache_params.qdrant_semantic_cache_embedding_model string The embedding model to use for qdrant semantic cache.
cache_params.qdrant_collection_name string The name of the collection to use for qdrant semantic cache.
cache_params.qdrant_quantization_config string The quantization configuration for the qdrant semantic cache.
cache_params.similarity_threshold float The similarity threshold for the semantic cache.
cache_params.s3_bucket_name string The name of the S3 bucket to use for the semantic cache.
cache_params.s3_region_name string The region name for the S3 bucket.
cache_params.s3_aws_access_key_id string The AWS access key ID for the S3 bucket.
cache_params.s3_aws_secret_access_key string The AWS secret access key for the S3 bucket.
cache_params.s3_endpoint_url string Optional - The endpoint URL for the S3 bucket.
cache_params.supported_call_types array of strings The types of calls to cache. Further docs
cache_params.mode string The mode of the cache. Further docs

general_settings - Reference

Name Type Description
completion_model string The default model to use for completions when model is not specified in the request
disable_spend_logs boolean If true, turns off writing each transaction to the database
disable_master_key_return boolean If true, turns off returning master key on UI. (checked on '/user/info' endpoint)
disable_retry_on_max_parallel_request_limit_error boolean If true, turns off retries when max parallel request limit is reached
disable_reset_budget boolean If true, turns off reset budget scheduled task
disable_adding_master_key_hash_to_db boolean If true, turns off storing master key hash in db
enable_jwt_auth boolean allow proxy admin to auth in via jwt tokens with 'litellm_proxy_admin' in claims. Doc on JWT Tokens
enforce_user_param boolean If true, requires all OpenAI endpoint requests to have a 'user' param. Doc on call hooks
allowed_routes array of strings List of allowed proxy API routes a user can access Doc on controlling allowed routes
key_management_system string Specifies the key management system. Doc Secret Managers
master_key string The master key for the proxy Set up Virtual Keys
database_url string The URL for the database connection Set up Virtual Keys
database_connection_pool_limit integer The limit for database connection pool Setting DB Connection Pool limit
database_connection_timeout integer The timeout for database connections in seconds Setting DB Connection Pool limit, timeout
allow_requests_on_db_unavailable boolean If true, allows requests to succeed even if DB is unreachable. Only use this if running LiteLLM in your VPC This will allow requests to work even when LiteLLM cannot connect to the DB to verify a Virtual Key
custom_auth string Write your own custom authentication logic Doc Custom Auth
max_parallel_requests integer The max parallel requests allowed per deployment
global_max_parallel_requests integer The max parallel requests allowed on the proxy overall
infer_model_from_keys boolean If true, infers the model from the provided keys
background_health_checks boolean If true, enables background health checks. Doc on health checks
health_check_interval integer The interval for health checks in seconds Doc on health checks
alerting array of strings List of alerting methods Doc on Slack Alerting
alerting_threshold integer The threshold for triggering alerts Doc on Slack Alerting
use_client_credentials_pass_through_routes boolean If true, uses client credentials for all pass-through routes. Doc on pass through routes
health_check_details boolean If false, hides health check details (e.g. remaining rate limit). Doc on health checks
public_routes List[str] (Enterprise Feature) Control list of public routes
alert_types List[str] Control list of alert types to send to slack (Doc on alert types)[./alerting.md]
enforced_params List[str] (Enterprise Feature) List of params that must be included in all requests to the proxy
enable_oauth2_auth boolean (Enterprise Feature) If true, enables oauth2.0 authentication
use_x_forwarded_for str If true, uses the X-Forwarded-For header to get the client IP address
service_account_settings List[Dict[str, Any]] Set service_account_settings if you want to create settings that only apply to service account keys (Doc on service accounts)[./service_accounts.md]
image_generation_model str The default model to use for image generation - ignores model set in request
store_model_in_db boolean If true, allows /model/new endpoint to store model information in db. Endpoint disabled by default. Doc on /model/new endpoint
max_request_size_mb int The maximum size for requests in MB. Requests above this size will be rejected.
max_response_size_mb int The maximum size for responses in MB. LLM Responses above this size will not be sent.
proxy_budget_rescheduler_min_time int The minimum time (in seconds) to wait before checking db for budget resets. Default is 597 seconds
proxy_budget_rescheduler_max_time int The maximum time (in seconds) to wait before checking db for budget resets. Default is 605 seconds
proxy_batch_write_at int Time (in seconds) to wait before batch writing spend logs to the db. Default is 10 seconds
alerting_args dict Args for Slack Alerting Doc on Slack Alerting
custom_key_generate str Custom function for key generation Doc on custom key generation
allowed_ips List[str] List of IPs allowed to access the proxy. If not set, all IPs are allowed.
embedding_model str The default model to use for embeddings - ignores model set in request
default_team_disabled boolean If true, users cannot create 'personal' keys (keys with no team_id).
alert_to_webhook_url Dict[str] Specify a webhook url for each alert type.
key_management_settings List[Dict[str, Any]] Settings for key management system (e.g. AWS KMS, Azure Key Vault) Doc on key management
allow_user_auth boolean (Deprecated) old approach for user authentication.
user_api_key_cache_ttl int The time (in seconds) to cache user api keys in memory.
disable_prisma_schema_update boolean If true, turns off automatic schema updates to DB
litellm_key_header_name str If set, allows passing LiteLLM keys as a custom header. Doc on custom headers
moderation_model str The default model to use for moderation.
custom_sso str Path to a python file that implements custom SSO logic. Doc on custom SSO
allow_client_side_credentials boolean If true, allows passing client side credentials to the proxy. (Useful when testing finetuning models) Doc on client side credentials
admin_only_routes List[str] (Enterprise Feature) List of routes that are only accessible to admin users. Doc on admin only routes
use_azure_key_vault boolean If true, load keys from azure key vault
use_google_kms boolean If true, load keys from google kms
spend_report_frequency str Specify how often you want a Spend Report to be sent (e.g. "1d", "2d", "30d") More on this
ui_access_mode Literal["admin_only"] If set, restricts access to the UI to admin users only. Docs
litellm_jwtauth Dict[str, Any] Settings for JWT authentication. Docs
litellm_license str The license key for the proxy. Docs
oauth2_config_mappings Dict[str, str] Define the OAuth2 config mappings
pass_through_endpoints List[Dict[str, Any]] Define the pass through endpoints. Docs
enable_oauth2_proxy_auth boolean (Enterprise Feature) If true, enables oauth2.0 authentication
forward_openai_org_id boolean If true, forwards the OpenAI Organization ID to the backend LLM call (if it's OpenAI).
forward_client_headers_to_llm_api boolean If true, forwards the client headers (any x- headers) to the backend LLM call

router_settings - Reference

router_settings:
  routing_strategy: usage-based-routing-v2 # Literal["simple-shuffle", "least-busy", "usage-based-routing","latency-based-routing"], default="simple-shuffle"
  redis_host: <your-redis-host>           # string
  redis_password: <your-redis-password>   # string
  redis_port: <your-redis-port>           # string
  enable_pre_call_check: true             # bool - Before call is made check if a call is within model context window 
  allowed_fails: 3 # cooldown model if it fails > 1 call in a minute. 
  cooldown_time: 30 # (in seconds) how long to cooldown model if fails/min > allowed_fails
  disable_cooldowns: True                  # bool - Disable cooldowns for all models 
  enable_tag_filtering: True                # bool - Use tag based routing for requests
  retry_policy: {                          # Dict[str, int]: retry policy for different types of exceptions
    "AuthenticationErrorRetries": 3,
    "TimeoutErrorRetries": 3,
    "RateLimitErrorRetries": 3,
    "ContentPolicyViolationErrorRetries": 4,
    "InternalServerErrorRetries": 4
  }
  allowed_fails_policy: {
    "BadRequestErrorAllowedFails": 1000, # Allow 1000 BadRequestErrors before cooling down a deployment
    "AuthenticationErrorAllowedFails": 10, # int 
    "TimeoutErrorAllowedFails": 12, # int 
    "RateLimitErrorAllowedFails": 10000, # int 
    "ContentPolicyViolationErrorAllowedFails": 15, # int 
    "InternalServerErrorAllowedFails": 20, # int 
  }
  content_policy_fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for content policy violations
  fallbacks=[{"claude-2": ["my-fallback-model"]}] # List[Dict[str, List[str]]]: Fallback model for all errors
Name Type Description
routing_strategy string The strategy used for routing requests. Options: "simple-shuffle", "least-busy", "usage-based-routing", "latency-based-routing". Default is "simple-shuffle". More information here
redis_host string The host address for the Redis server. Only set this if you have multiple instances of LiteLLM Proxy and want current tpm/rpm tracking to be shared across them
redis_password string The password for the Redis server. Only set this if you have multiple instances of LiteLLM Proxy and want current tpm/rpm tracking to be shared across them
redis_port string The port number for the Redis server. Only set this if you have multiple instances of LiteLLM Proxy and want current tpm/rpm tracking to be shared across them
enable_pre_call_check boolean If true, checks if a call is within the model's context window before making the call. More information here
content_policy_fallbacks array of objects Specifies fallback models for content policy violations. More information here
fallbacks array of objects Specifies fallback models for all types of errors. More information here
enable_tag_filtering boolean If true, uses tag based routing for requests Tag Based Routing
cooldown_time integer The duration (in seconds) to cooldown a model if it exceeds the allowed failures.
disable_cooldowns boolean If true, disables cooldowns for all models. More information here
retry_policy object Specifies the number of retries for different types of exceptions. More information here
allowed_fails integer The number of failures allowed before cooling down a model. More information here
allowed_fails_policy object Specifies the number of allowed failures for different error types before cooling down a deployment. More information here

environment variables - Reference

Name Description
ACTIONS_ID_TOKEN_REQUEST_TOKEN Token for requesting ID in GitHub Actions
ACTIONS_ID_TOKEN_REQUEST_URL URL for requesting ID token in GitHub Actions
AISPEND_ACCOUNT_ID Account ID for AI Spend
AISPEND_API_KEY API Key for AI Spend
ALLOWED_EMAIL_DOMAINS List of email domains allowed for access
ARIZE_API_KEY API key for Arize platform integration
ARIZE_SPACE_KEY Space key for Arize platform
ARGILLA_BATCH_SIZE Batch size for Argilla logging
ARGILLA_API_KEY API key for Argilla platform
ARGILLA_SAMPLING_RATE Sampling rate for Argilla logging
ARGILLA_DATASET_NAME Dataset name for Argilla logging
ARGILLA_BASE_URL Base URL for Argilla service
ATHINA_API_KEY API key for Athina service
AUTH_STRATEGY Strategy used for authentication (e.g., OAuth, API key)
AWS_ACCESS_KEY_ID Access Key ID for AWS services
AWS_PROFILE_NAME AWS CLI profile name to be used
AWS_REGION_NAME Default AWS region for service interactions
AWS_ROLE_NAME Role name for AWS IAM usage
AWS_SECRET_ACCESS_KEY Secret Access Key for AWS services
AWS_SESSION_NAME Name for AWS session
AWS_WEB_IDENTITY_TOKEN Web identity token for AWS
AZURE_API_VERSION Version of the Azure API being used
AZURE_AUTHORITY_HOST Azure authority host URL
AZURE_CLIENT_ID Client ID for Azure services
AZURE_CLIENT_SECRET Client secret for Azure services
AZURE_FEDERATED_TOKEN_FILE File path to Azure federated token
AZURE_KEY_VAULT_URI URI for Azure Key Vault
AZURE_TENANT_ID Tenant ID for Azure Active Directory
BERRISPEND_ACCOUNT_ID Account ID for BerriSpend service
BRAINTRUST_API_KEY API key for Braintrust integration
CIRCLE_OIDC_TOKEN OpenID Connect token for CircleCI
CIRCLE_OIDC_TOKEN_V2 Version 2 of the OpenID Connect token for CircleCI
CONFIG_FILE_PATH File path for configuration file
CUSTOM_TIKTOKEN_CACHE_DIR Custom directory for Tiktoken cache
DATABASE_HOST Hostname for the database server
DATABASE_NAME Name of the database
DATABASE_PASSWORD Password for the database user
DATABASE_PORT Port number for database connection
DATABASE_SCHEMA Schema name used in the database
DATABASE_URL Connection URL for the database
DATABASE_USER Username for database connection
DATABASE_USERNAME Alias for database user
DATABRICKS_API_BASE Base URL for Databricks API
DD_BASE_URL Base URL for Datadog integration
DATADOG_BASE_URL (Alternative to DD_BASE_URL) Base URL for Datadog integration
_DATADOG_BASE_URL (Alternative to DD_BASE_URL) Base URL for Datadog integration
DD_API_KEY API key for Datadog integration
DD_SITE Site URL for Datadog (e.g., datadoghq.com)
DD_SOURCE Source identifier for Datadog logs
DD_ENV Environment identifier for Datadog logs. Only supported for datadog_llm_observability callback
DEBUG_OTEL Enable debug mode for OpenTelemetry
DIRECT_URL Direct URL for service endpoint
DISABLE_ADMIN_UI Toggle to disable the admin UI
DISABLE_SCHEMA_UPDATE Toggle to disable schema updates
DOCS_DESCRIPTION Description text for documentation pages
DOCS_FILTERED Flag indicating filtered documentation
DOCS_TITLE Title of the documentation pages
DOCS_URL The path to the Swagger API documentation. By default this is "/"
EMAIL_SUPPORT_CONTACT Support contact email address
GCS_BUCKET_NAME Name of the Google Cloud Storage bucket
GCS_PATH_SERVICE_ACCOUNT Path to the Google Cloud service account JSON file
GCS_FLUSH_INTERVAL Flush interval for GCS logging (in seconds). Specify how often you want a log to be sent to GCS. Default is 20 seconds
GCS_BATCH_SIZE Batch size for GCS logging. Specify after how many logs you want to flush to GCS. If BATCH_SIZE is set to 10, logs are flushed every 10 logs. Default is 2048
GENERIC_AUTHORIZATION_ENDPOINT Authorization endpoint for generic OAuth providers
GENERIC_CLIENT_ID Client ID for generic OAuth providers
GENERIC_CLIENT_SECRET Client secret for generic OAuth providers
GENERIC_CLIENT_STATE State parameter for generic client authentication
GENERIC_INCLUDE_CLIENT_ID Include client ID in requests for OAuth
GENERIC_SCOPE Scope settings for generic OAuth providers
GENERIC_TOKEN_ENDPOINT Token endpoint for generic OAuth providers
GENERIC_USER_DISPLAY_NAME_ATTRIBUTE Attribute for user's display name in generic auth
GENERIC_USER_EMAIL_ATTRIBUTE Attribute for user's email in generic auth
GENERIC_USER_FIRST_NAME_ATTRIBUTE Attribute for user's first name in generic auth
GENERIC_USER_ID_ATTRIBUTE Attribute for user ID in generic auth
GENERIC_USER_LAST_NAME_ATTRIBUTE Attribute for user's last name in generic auth
GENERIC_USER_PROVIDER_ATTRIBUTE Attribute specifying the user's provider
GENERIC_USER_ROLE_ATTRIBUTE Attribute specifying the user's role
GENERIC_USERINFO_ENDPOINT Endpoint to fetch user information in generic OAuth
GALILEO_BASE_URL Base URL for Galileo platform
GALILEO_PASSWORD Password for Galileo authentication
GALILEO_PROJECT_ID Project ID for Galileo usage
GALILEO_USERNAME Username for Galileo authentication
GREENSCALE_API_KEY API key for Greenscale service
GREENSCALE_ENDPOINT Endpoint URL for Greenscale service
GOOGLE_APPLICATION_CREDENTIALS Path to Google Cloud credentials JSON file
GOOGLE_CLIENT_ID Client ID for Google OAuth
GOOGLE_CLIENT_SECRET Client secret for Google OAuth
GOOGLE_KMS_RESOURCE_NAME Name of the resource in Google KMS
HF_API_BASE Base URL for Hugging Face API
HELICONE_API_KEY API key for Helicone service
HUGGINGFACE_API_BASE Base URL for Hugging Face API
IAM_TOKEN_DB_AUTH IAM token for database authentication
JSON_LOGS Enable JSON formatted logging
JWT_AUDIENCE Expected audience for JWT tokens
JWT_PUBLIC_KEY_URL URL to fetch public key for JWT verification
LAGO_API_BASE Base URL for Lago API
LAGO_API_CHARGE_BY Parameter to determine charge basis in Lago
LAGO_API_EVENT_CODE Event code for Lago API events
LAGO_API_KEY API key for accessing Lago services
LANGFUSE_DEBUG Toggle debug mode for Langfuse
LANGFUSE_FLUSH_INTERVAL Interval for flushing Langfuse logs
LANGFUSE_HOST Host URL for Langfuse service
LANGFUSE_PUBLIC_KEY Public key for Langfuse authentication
LANGFUSE_RELEASE Release version of Langfuse integration
LANGFUSE_SECRET_KEY Secret key for Langfuse authentication
LANGSMITH_API_KEY API key for Langsmith platform
LANGSMITH_BASE_URL Base URL for Langsmith service
LANGSMITH_BATCH_SIZE Batch size for operations in Langsmith
LANGSMITH_DEFAULT_RUN_NAME Default name for Langsmith run
LANGSMITH_PROJECT Project name for Langsmith integration
LANGSMITH_SAMPLING_RATE Sampling rate for Langsmith logging
LANGTRACE_API_KEY API key for Langtrace service
LITERAL_API_KEY API key for Literal integration
LITERAL_API_URL API URL for Literal service
LITERAL_BATCH_SIZE Batch size for Literal operations
LITELLM_DONT_SHOW_FEEDBACK_BOX Flag to hide feedback box in LiteLLM UI
LITELLM_DROP_PARAMS Parameters to drop in LiteLLM requests
LITELLM_EMAIL Email associated with LiteLLM account
LITELLM_GLOBAL_MAX_PARALLEL_REQUEST_RETRIES Maximum retries for parallel requests in LiteLLM
LITELLM_GLOBAL_MAX_PARALLEL_REQUEST_RETRY_TIMEOUT Timeout for retries of parallel requests in LiteLLM
LITELLM_HOSTED_UI URL of the hosted UI for LiteLLM
LITELLM_LICENSE License key for LiteLLM usage
LITELLM_LOCAL_MODEL_COST_MAP Local configuration for model cost mapping in LiteLLM
LITELLM_LOG Enable detailed logging for LiteLLM
LITELLM_MODE Operating mode for LiteLLM (e.g., production, development)
LITELLM_SALT_KEY Salt key for encryption in LiteLLM
LITELLM_SECRET_AWS_KMS_LITELLM_LICENSE AWS KMS encrypted license for LiteLLM
LITELLM_TOKEN Access token for LiteLLM integration
LOGFIRE_TOKEN Token for Logfire logging service
MICROSOFT_CLIENT_ID Client ID for Microsoft services
MICROSOFT_CLIENT_SECRET Client secret for Microsoft services
MICROSOFT_TENANT Tenant ID for Microsoft Azure
NO_DOCS Flag to disable documentation generation
NO_PROXY List of addresses to bypass proxy
OAUTH_TOKEN_INFO_ENDPOINT Endpoint for OAuth token info retrieval
OPENAI_API_BASE Base URL for OpenAI API
OPENAI_API_KEY API key for OpenAI services
OPENAI_ORGANIZATION Organization identifier for OpenAI
OPENID_BASE_URL Base URL for OpenID Connect services
OPENID_CLIENT_ID Client ID for OpenID Connect authentication
OPENID_CLIENT_SECRET Client secret for OpenID Connect authentication
OPENMETER_API_ENDPOINT API endpoint for OpenMeter integration
OPENMETER_API_KEY API key for OpenMeter services
OPENMETER_EVENT_TYPE Type of events sent to OpenMeter
OTEL_ENDPOINT OpenTelemetry endpoint for traces
OTEL_ENVIRONMENT_NAME Environment name for OpenTelemetry
OTEL_EXPORTER Exporter type for OpenTelemetry
OTEL_HEADERS Headers for OpenTelemetry requests
OTEL_SERVICE_NAME Service name identifier for OpenTelemetry
OTEL_TRACER_NAME Tracer name for OpenTelemetry tracing
PREDIBASE_API_BASE Base URL for Predibase API
PRESIDIO_ANALYZER_API_BASE Base URL for Presidio Analyzer service
PRESIDIO_ANONYMIZER_API_BASE Base URL for Presidio Anonymizer service
PROMETHEUS_URL URL for Prometheus service
PROMPTLAYER_API_KEY API key for PromptLayer integration
PROXY_ADMIN_ID Admin identifier for proxy server
PROXY_BASE_URL Base URL for proxy service
PROXY_LOGOUT_URL URL for logging out of the proxy service
PROXY_MASTER_KEY Master key for proxy authentication
QDRANT_API_BASE Base URL for Qdrant API
QDRANT_API_KEY API key for Qdrant service
QDRANT_URL Connection URL for Qdrant database
REDIS_HOST Hostname for Redis server
REDIS_PASSWORD Password for Redis service
REDIS_PORT Port number for Redis server
REDOC_URL The path to the Redoc Fast API documentation. By default this is "/redoc"
SERVER_ROOT_PATH Root path for the server application
SET_VERBOSE Flag to enable verbose logging
SLACK_DAILY_REPORT_FREQUENCY Frequency of daily Slack reports (e.g., daily, weekly)
SLACK_WEBHOOK_URL Webhook URL for Slack integration
SMTP_HOST Hostname for the SMTP server
SMTP_PASSWORD Password for SMTP authentication
SMTP_PORT Port number for SMTP server
SMTP_SENDER_EMAIL Email address used as the sender in SMTP transactions
SMTP_SENDER_LOGO Logo used in emails sent via SMTP
SMTP_TLS Flag to enable or disable TLS for SMTP connections
SMTP_USERNAME Username for SMTP authentication
SPEND_LOGS_URL URL for retrieving spend logs
SSL_CERTIFICATE Path to the SSL certificate file
SSL_VERIFY Flag to enable or disable SSL certificate verification
SUPABASE_KEY API key for Supabase service
SUPABASE_URL Base URL for Supabase instance
TEST_EMAIL_ADDRESS Email address used for testing purposes
UI_LOGO_PATH Path to the logo image used in the UI
UI_PASSWORD Password for accessing the UI
UI_USERNAME Username for accessing the UI
UPSTREAM_LANGFUSE_DEBUG Flag to enable debugging for upstream Langfuse
UPSTREAM_LANGFUSE_HOST Host URL for upstream Langfuse service
UPSTREAM_LANGFUSE_PUBLIC_KEY Public key for upstream Langfuse authentication
UPSTREAM_LANGFUSE_RELEASE Release version identifier for upstream Langfuse
UPSTREAM_LANGFUSE_SECRET_KEY Secret key for upstream Langfuse authentication
USE_AWS_KMS Flag to enable AWS Key Management Service for encryption
WEBHOOK_URL URL for receiving webhooks from external services

Extras

Disable Swagger UI

To disable the Swagger docs from the base url, set

NO_DOCS="True"

in your environment, and restart the proxy.

Use CONFIG_FILE_PATH for proxy (Easier Azure container deployment)

  1. Setup config.yaml
model_list:
  - model_name: gpt-3.5-turbo
    litellm_params:
      model: gpt-3.5-turbo
      api_key: os.environ/OPENAI_API_KEY
  1. Store filepath as env var
CONFIG_FILE_PATH="/path/to/config.yaml"
  1. Start Proxy
$ litellm 

# RUNNING on http://0.0.0.0:4000

Providing LiteLLM config.yaml file as a s3, GCS Bucket Object/url

Use this if you cannot mount a config file on your deployment service (example - AWS Fargate, Railway etc)

LiteLLM Proxy will read your config.yaml from an s3 Bucket or GCS Bucket

Set the following .env vars

LITELLM_CONFIG_BUCKET_TYPE = "gcs"                              # set this to "gcs"         
LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy"                    # your bucket name on GCS
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "proxy_config.yaml"         # object key on GCS

Start litellm proxy with these env vars - litellm will read your config from GCS

docker run --name litellm-proxy \
   -e DATABASE_URL=<database_url> \
   -e LITELLM_CONFIG_BUCKET_NAME=<bucket_name> \
   -e LITELLM_CONFIG_BUCKET_OBJECT_KEY="<object_key>> \
   -e LITELLM_CONFIG_BUCKET_TYPE="gcs" \
   -p 4000:4000 \
   ghcr.io/berriai/litellm-database:main-latest --detailed_debug

Set the following .env vars

LITELLM_CONFIG_BUCKET_NAME = "litellm-proxy"                    # your bucket name on s3 
LITELLM_CONFIG_BUCKET_OBJECT_KEY = "litellm_proxy_config.yaml"  # object key on s3

Start litellm proxy with these env vars - litellm will read your config from s3

docker run --name litellm-proxy \
   -e DATABASE_URL=<database_url> \
   -e LITELLM_CONFIG_BUCKET_NAME=<bucket_name> \
   -e LITELLM_CONFIG_BUCKET_OBJECT_KEY="<object_key>> \
   -p 4000:4000 \
   ghcr.io/berriai/litellm-database:main-latest