What endpoints does the litellm proxy have 💥 LiteLLM Proxy Server LiteLLM Server manages: Calling 100+ LLMs Huggingface/Bedrock/TogetherAI/etc. in the OpenAI ChatCompletions & Completions format Set custom prompt templates + model-specific configs (temperature, max_tokens, etc.) Quick Start View all the supported args for the Proxy CLI here $ litellm --model huggingface/bigcode/starcoder #INFO: Proxy running on http://0.0.0.0:8000 Test In a new shell, run, this will make an openai.ChatCompletion request litellm --test This will now automatically route any requests for gpt-3.5-turbo to bigcode starcoder, hosted on huggingface inference endpoints. Replace openai base import openai openai.api_base = "http://0.0.0.0:8000" print(openai.chat.completions.create(model="test", messages=[{"role":"user", "content":"Hey!"}])) Supported LLMs Bedrock Huggingface (TGI) Anthropic VLLM OpenAI Compatible Server TogetherAI Replicate Petals Palm Azure OpenAI AI21 Cohere $ export AWS_ACCESS_KEY_ID="" $ export AWS_REGION_NAME="" # e.g. us-west-2 $ export AWS_SECRET_ACCESS_KEY="" $ litellm --model bedrock/anthropic.claude-v2 Server Endpoints POST /chat/completions - chat completions endpoint to call 100+ LLMs POST /completions - completions endpoint POST /embeddings - embedding endpoint for Azure, OpenAI, Huggingface endpoints GET /models - available models on server