mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-26 03:04:13 +00:00
* fix(langfuse.py): prevent double logging requester metadata Fixes https://github.com/BerriAI/litellm/issues/5935 * build(model_prices_and_context_window.json): add mistral pixtral cost tracking Closes https://github.com/BerriAI/litellm/issues/5837 * handle streaming for azure ai studio error * [Perf Proxy] parallel request limiter - use one cache update call (#5932) * fix parallel request limiter - use one cache update call * ci/cd run again * run ci/cd again * use docker username password * fix config.yml * fix config * fix config * fix config.yml * ci/cd run again * use correct typing for batch set cache * fix async_set_cache_pipeline * fix only check user id tpm / rpm limits when limits set * fix test_openai_azure_embedding_with_oidc_and_cf * fix(groq/chat/transformation.py): Fixes https://github.com/BerriAI/litellm/issues/5839 * feat(anthropic/chat.py): return 'retry-after' headers from anthropic Fixes https://github.com/BerriAI/litellm/issues/4387 * feat: raise validation error if message has tool calls without passing `tools` param for anthropic/bedrock Closes https://github.com/BerriAI/litellm/issues/5747 * [Feature]#5940, add max_workers parameter for the batch_completion (#5947) * handle streaming for azure ai studio error * bump: version 1.48.2 → 1.48.3 * docs(data_security.md): add legal/compliance faq's Make it easier for companies to use litellm * docs: resolve imports * [Feature]#5940, add max_workers parameter for the batch_completion method --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local> * fix(converse_transformation.py): fix default message value * fix(utils.py): fix get_model_info to handle finetuned models Fixes issue for standard logging payloads, where model_map_value was null for finetuned openai models * fix(litellm_pre_call_utils.py): add debug statement for data sent after updating with team/key callbacks * fix: fix linting errors * fix(anthropic/chat/handler.py): fix cache creation input tokens * fix(exception_mapping_utils.py): fix missing imports * fix(anthropic/chat/handler.py): fix usage block translation * test: fix test * test: fix tests * style(types/utils.py): trigger new build * test: fix test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Jose Alberto Arango Sanchez <jose.arangos@udea.edu.co> Co-authored-by: josearangos <josearangos@Joses-MacBook-Pro.local> |
||
---|---|---|
.. | ||
_experimental | ||
analytics_endpoints | ||
auth | ||
common_utils | ||
config_management_endpoints | ||
db | ||
example_config_yaml | ||
fine_tuning_endpoints | ||
guardrails | ||
health_endpoints | ||
hooks | ||
management_endpoints | ||
management_helpers | ||
openai_files_endpoints | ||
out | ||
pass_through_endpoints | ||
proxy_load_test | ||
queue | ||
rerank_endpoints | ||
spend_tracking | ||
tests | ||
ui_crud_endpoints | ||
vertex_ai_endpoints | ||
.gitignore | ||
__init__.py | ||
_logging.py | ||
_new_secret_config.yaml | ||
_super_secret_config.yaml | ||
_types.py | ||
admin_ui.py | ||
cached_logo.jpg | ||
caching_routes.py | ||
custom_callbacks.py | ||
custom_callbacks1.py | ||
custom_guardrail.py | ||
custom_handler.py | ||
custom_sso.py | ||
enterprise | ||
health_check.py | ||
lambda.py | ||
litellm_pre_call_utils.py | ||
llamaguard_prompt.txt | ||
logo.jpg | ||
openapi.json | ||
post_call_rules.py | ||
prisma_migration.py | ||
proxy_cli.py | ||
proxy_config.yaml | ||
proxy_server.py | ||
README.md | ||
route_llm_request.py | ||
schema.prisma | ||
start.sh | ||
utils.py |
litellm-proxy
A local, fast, and lightweight OpenAI-compatible server to call 100+ LLM APIs.
usage
$ pip install litellm
$ litellm --model ollama/codellama
#INFO: Ollama running on http://0.0.0.0:8000
replace openai base
import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
{
"role": "user",
"content": "this is a test request, write a short poem"
}
])
print(response)
See how to call Huggingface,Bedrock,TogetherAI,Anthropic, etc.
Folder Structure
Routes
proxy_server.py
- all openai-compatible routes -/v1/chat/completion
,/v1/embedding
+ model info routes -/v1/models
,/v1/model/info
,/v1/model_group_info
routes.health_endpoints/
-/health
,/health/liveliness
,/health/readiness
management_endpoints/key_management_endpoints.py
- all/key/*
routesmanagement_endpoints/team_endpoints.py
- all/team/*
routesmanagement_endpoints/internal_user_endpoints.py
- all/user/*
routesmanagement_endpoints/ui_sso.py
- all/sso/*
routes