litellm-mirror/litellm/proxy
Krish Dholakia 695f48a8f1
fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577)
* fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check

* fix(lowest_tpm_rpm_v2.py): return headers in correct format

* test: update test

* build(deps): bump cookie and express in /docs/my-website (#6566)

Bumps [cookie](https://github.com/jshttp/cookie) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `cookie` from 0.6.0 to 0.7.1
- [Release notes](https://github.com/jshttp/cookie/releases)
- [Commits](https://github.com/jshttp/cookie/compare/v0.6.0...v0.7.1)

Updates `express` from 4.20.0 to 4.21.1
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.1/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.20.0...4.21.1)

---
updated-dependencies:
- dependency-name: cookie
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* docs(virtual_keys.md): update Dockerfile reference (#6554)

Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>

* (proxy fix) - call connect on prisma client when running setup (#6534)

* critical fix - call connect on prisma client when running setup

* fix test_proxy_server_prisma_setup

* fix test_proxy_server_prisma_setup

* Add 3.5 haiku (#6588)

* feat: add claude-3-5-haiku-20241022 entries

* feat: add claude-3-5-haiku-20241022 and vertex_ai/claude-3-5-haiku@20241022 models

* add missing entries, remove vision

* remove image token costs

* Litellm perf improvements 3 (#6573)

* perf: move writing key to cache, to background task

* perf(litellm_pre_call_utils.py): add otel tracing for pre-call utils

adds 200ms on calls with pgdb connected

* fix(litellm_pre_call_utils.py'): rename call_type to actual call used

* perf(proxy_server.py): remove db logic from _get_config_from_file

was causing db calls to occur on every llm request, if team_id was set on key

* fix(auth_checks.py): add check for reducing db calls if user/team id does not exist in db

reduces latency/call by ~100ms

* fix(proxy_server.py): minor fix on existing_settings not incl alerting

* fix(exception_mapping_utils.py): map databricks exception string

* fix(auth_checks.py): fix auth check logic

* test: correctly mark flaky test

* fix(utils.py): handle auth token error for tokenizers.from_pretrained

* build: fix map

* build: fix map

* build: fix json for model map

* test: remove eol model

* fix(proxy_server.py): fix db config loading logic

* fix(proxy_server.py): fix order of config / db updates, to ensure fields not overwritten

* test: skip test if required env var is missing

* test: fix test

---------

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Emmanuel Ferdman <emmanuelferdman@gmail.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: paul-gauthier <69695708+paul-gauthier@users.noreply.github.com>
2024-11-05 22:03:44 +05:30
..
_experimental Litellm dev 11 02 2024 (#6561) 2024-11-04 07:48:20 +05:30
analytics_endpoints Litellm ruff linting enforcement (#5992) 2024-10-01 19:44:20 -04:00
auth fix allow using 15 seconds for premium license check 2024-11-04 16:09:15 -08:00
common_utils (code quality) add ruff check PLR0915 for too-many-statements (#6309) 2024-10-18 15:36:49 +05:30
config_management_endpoints feat(ui): for adding pass-through endpoints 2024-08-15 21:58:11 -07:00
db (code quality) add ruff check PLR0915 for too-many-statements (#6309) 2024-10-18 15:36:49 +05:30
example_config_yaml Litellm router max depth (#6501) 2024-10-29 22:05:41 -07:00
fine_tuning_endpoints Add pyright to ci/cd + Fix remaining type-checking errors (#6082) 2024-10-05 17:04:00 -04:00
guardrails (code quality) add ruff check PLR0915 for too-many-statements (#6309) 2024-10-18 15:36:49 +05:30
health_endpoints (fix) Langfuse key based logging (#6372) 2024-10-23 18:24:22 +05:30
hooks Litellm dev 11 02 2024 (#6561) 2024-11-04 07:48:20 +05:30
management_endpoints (UI) Fix viewing members, keys in a team + added testing (#6514) 2024-10-30 23:51:13 +05:30
management_helpers fix create_audit_log_for_update 2024-10-25 16:48:25 +04:00
openai_files_endpoints Litellm ruff linting enforcement (#5992) 2024-10-01 19:44:20 -04:00
pass_through_endpoints LiteLLM Minor Fixes & Improvements (10/30/2024) (#6519) 2024-11-02 00:44:32 +05:30
proxy_load_test Litellm ruff linting enforcement (#5992) 2024-10-01 19:44:20 -04:00
rerank_endpoints LiteLLM Minor Fixes & Improvements (09/26/2024) (#5925) (#5937) 2024-09-27 17:54:13 -07:00
spend_tracking (code quality) add ruff check PLR0915 for too-many-statements (#6309) 2024-10-18 15:36:49 +05:30
ui_crud_endpoints ui - add Create, get, delete endpoints for IP Addresses 2024-07-09 15:12:08 -07:00
vertex_ai_endpoints feat(custom_logger.py): expose new async_dataset_hook for modifying… (#6331) 2024-10-20 09:00:04 -07:00
.gitignore
__init__.py
_logging.py fix(_logging.py): fix timestamp format for json logs 2024-06-20 15:20:21 -07:00
_new_secret_config.yaml fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577) 2024-11-05 22:03:44 +05:30
_super_secret_config.yaml docs(enterprise.md): cleanup docs 2024-07-15 14:52:08 -07:00
_types.py Litellm dev 11 02 2024 (#6561) 2024-11-04 07:48:20 +05:30
cached_logo.jpg (feat) use hosted images for custom branding 2024-02-22 14:51:40 -08:00
caching_routes.py (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208) 2024-10-14 16:34:01 +05:30
custom_sso.py Litellm ruff linting enforcement (#5992) 2024-10-01 19:44:20 -04:00
enterprise feat(llama_guard.py): add llama guard support for content moderation + new async_moderation_hook endpoint 2024-02-17 19:13:04 -08:00
health_check.py LiteLLM Minor Fixes and Improvements (09/14/2024) (#5697) 2024-09-14 10:32:39 -07:00
lambda.py
litellm_pre_call_utils.py Litellm perf improvements 3 (#6573) 2024-11-05 03:51:26 +05:30
llamaguard_prompt.txt feat(llama_guard.py): allow user to define custom unsafe content categories 2024-02-17 17:42:47 -08:00
logo.jpg (feat) admin ui custom branding 2024-02-21 17:34:42 -08:00
openapi.json
post_call_rules.py
prisma_migration.py Litellm expose disable schema update flag (#6085) 2024-10-05 21:26:51 -04:00
proxy_cli.py (docs + testing) Correctly document the timeout value used by litellm proxy is 6000 seconds + add to best practices for prod (#6339) 2024-10-23 14:09:35 +05:30
proxy_config.yaml (fix) slack alerting - don't spam the failed cost tracking alert for the same model (#6543) 2024-11-01 18:36:17 +05:30
proxy_server.py fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577) 2024-11-05 22:03:44 +05:30
README.md [Feat-Proxy] Allow using custom sso handler (#5809) 2024-09-20 19:14:33 -07:00
route_llm_request.py (feat) use regex pattern matching for wildcard routing (#6150) 2024-10-10 18:24:16 +05:30
schema.prisma track created, updated at virtual keys 2024-10-25 07:19:29 +04:00
start.sh
utils.py Litellm perf improvements 3 (#6573) 2024-11-05 03:51:26 +05:30

litellm-proxy

A local, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs.

usage

$ pip install litellm
$ litellm --model ollama/codellama 

#INFO: Ollama running on http://0.0.0.0:8000

replace openai base

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

See how to call Huggingface,Bedrock,TogetherAI,Anthropic, etc.


Folder Structure

Routes

  • proxy_server.py - all openai-compatible routes - /v1/chat/completion, /v1/embedding + model info routes - /v1/models, /v1/model/info, /v1/model_group_info routes.
  • health_endpoints/ - /health, /health/liveliness, /health/readiness
  • management_endpoints/key_management_endpoints.py - all /key/* routes
  • management_endpoints/team_endpoints.py - all /team/* routes
  • management_endpoints/internal_user_endpoints.py - all /user/* routes
  • management_endpoints/ui_sso.py - all /sso/* routes