diff --git a/README.md b/README.md index a7ce7b5ba..6beb53790 100644 --- a/README.md +++ b/README.md @@ -25,27 +25,26 @@ LiteLLM manages: + - Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints - [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']` - Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing) - Set Budgets & Rate limits per project, api key, model [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy) - [**Jump to OpenAI Proxy Docs**](https://github.com/BerriAI/litellm?tab=readme-ov-file#openai-proxy---docs)
[**Jump to Supported LLM Providers**](https://github.com/BerriAI/litellm?tab=readme-ov-file#supported-provider-docs) Support for more providers. Missing a provider or LLM Platform, raise a [feature request](https://github.com/BerriAI/litellm/issues/new?assignees=&labels=enhancement&projects=&template=feature_request.yml&title=%5BFeature%5D%3A+). # Usage ([**Docs**](https://docs.litellm.ai/docs/)) + > [!IMPORTANT] > LiteLLM v1.0.0 now requires `openai>=1.0.0`. Migration guide [here](https://docs.litellm.ai/docs/migration) - Open In Colab - ```shell pip install litellm ``` @@ -54,9 +53,9 @@ pip install litellm from litellm import completion import os -## set ENV variables -os.environ["OPENAI_API_KEY"] = "your-openai-key" -os.environ["COHERE_API_KEY"] = "your-cohere-key" +## set ENV variables +os.environ["OPENAI_API_KEY"] = "your-openai-key" +os.environ["COHERE_API_KEY"] = "your-cohere-key" messages = [{ "content": "Hello, how are you?","role": "user"}] @@ -87,8 +86,10 @@ print(response) ``` ## Streaming ([Docs](https://docs.litellm.ai/docs/completion/stream)) + liteLLM supports streaming the model response back, pass `stream=True` to get a streaming iterator in response. Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.) + ```python from litellm import completion response = completion(model="gpt-3.5-turbo", messages=messages, stream=True) @@ -102,20 +103,22 @@ for part in response: ``` ## Logging Observability ([Docs](https://docs.litellm.ai/docs/observability/callbacks)) -LiteLLM exposes pre defined callbacks to send data to Langfuse, DynamoDB, s3 Buckets, LLMonitor, Helicone, Promptlayer, Traceloop, Athina, Slack + +LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack + ```python from litellm import completion ## set env variables for logging tools os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" -os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" +os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" os.environ["ATHINA_API_KEY"] = "your-athina-api-key" os.environ["OPENAI_API_KEY"] # set callbacks -litellm.success_callback = ["langfuse", "llmonitor", "athina"] # log input/output to langfuse, llmonitor, supabase, athina etc +litellm.success_callback = ["langfuse", "lunary", "athina"] # log input/output to langfuse, lunary, supabase, athina etc #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) @@ -125,7 +128,8 @@ response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content Set Budgets & Rate limits across multiple projects -The proxy provides: +The proxy provides: + 1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth) 2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class) 3. [Cost tracking](https://docs.litellm.ai/docs/proxy/virtual_keys#tracking-spend) @@ -133,13 +137,14 @@ The proxy provides: ## 📖 Proxy Endpoints - [Swagger Docs](https://litellm-api.up.railway.app/) -## Quick Start Proxy - CLI +## Quick Start Proxy - CLI ```shell pip install 'litellm[proxy]' ``` ### Step 1: Start litellm proxy + ```shell $ litellm --model huggingface/bigcode/starcoder @@ -147,6 +152,7 @@ $ litellm --model huggingface/bigcode/starcoder ``` ### Step 2: Make ChatCompletions Request to Proxy + ```python import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url @@ -162,13 +168,15 @@ print(response) ``` ## Proxy Key Management ([Docs](https://docs.litellm.ai/docs/proxy/virtual_keys)) -UI on `/ui` on your proxy server + +UI on `/ui` on your proxy server ![ui_3](https://github.com/BerriAI/litellm/assets/29436595/47c97d5e-b9be-4839-b28c-43d7f4f10033) Set budgets and rate limits across multiple projects `POST /key/generate` ### Request + ```shell curl 'http://0.0.0.0:8000/key/generate' \ --header 'Authorization: Bearer sk-1234' \ @@ -177,6 +185,7 @@ curl 'http://0.0.0.0:8000/key/generate' \ ``` ### Expected Response + ```shell { "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token @@ -185,56 +194,60 @@ curl 'http://0.0.0.0:8000/key/generate' \ ``` ## Supported Providers ([Docs](https://docs.litellm.ai/docs/providers)) -| Provider | [Completion](https://docs.litellm.ai/docs/#basic-usage) | [Streaming](https://docs.litellm.ai/docs/completion/stream#streaming-responses) | [Async Completion](https://docs.litellm.ai/docs/completion/stream#async-completion) | [Async Streaming](https://docs.litellm.ai/docs/completion/stream#async-streaming) | [Async Embedding](https://docs.litellm.ai/docs/embedding/supported_embedding) | [Async Image Generation](https://docs.litellm.ai/docs/image_generation) | -| ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | ------------- | -| [openai](https://docs.litellm.ai/docs/providers/openai) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| [azure](https://docs.litellm.ai/docs/providers/azure) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | -| [aws - sagemaker](https://docs.litellm.ai/docs/providers/aws_sagemaker) | ✅ | ✅ | ✅ | ✅ | ✅ | -| [aws - bedrock](https://docs.litellm.ai/docs/providers/bedrock) | ✅ | ✅ | ✅ | ✅ |✅ | -| [google - vertex_ai [Gemini]](https://docs.litellm.ai/docs/providers/vertex) | ✅ | ✅ | ✅ | ✅ | -| [google - palm](https://docs.litellm.ai/docs/providers/palm) | ✅ | ✅ | ✅ | ✅ | -| [google AI Studio - gemini](https://docs.litellm.ai/docs/providers/gemini) | ✅ | | ✅ | | | -| [mistral ai api](https://docs.litellm.ai/docs/providers/mistral) | ✅ | ✅ | ✅ | ✅ | ✅ | -| [cloudflare AI Workers](https://docs.litellm.ai/docs/providers/cloudflare_workers) | ✅ | ✅ | ✅ | ✅ | -| [cohere](https://docs.litellm.ai/docs/providers/cohere) | ✅ | ✅ | ✅ | ✅ | ✅ | -| [anthropic](https://docs.litellm.ai/docs/providers/anthropic) | ✅ | ✅ | ✅ | ✅ | -| [huggingface](https://docs.litellm.ai/docs/providers/huggingface) | ✅ | ✅ | ✅ | ✅ | ✅ | -| [replicate](https://docs.litellm.ai/docs/providers/replicate) | ✅ | ✅ | ✅ | ✅ | -| [together_ai](https://docs.litellm.ai/docs/providers/togetherai) | ✅ | ✅ | ✅ | ✅ | -| [openrouter](https://docs.litellm.ai/docs/providers/openrouter) | ✅ | ✅ | ✅ | ✅ | -| [ai21](https://docs.litellm.ai/docs/providers/ai21) | ✅ | ✅ | ✅ | ✅ | -| [baseten](https://docs.litellm.ai/docs/providers/baseten) | ✅ | ✅ | ✅ | ✅ | -| [vllm](https://docs.litellm.ai/docs/providers/vllm) | ✅ | ✅ | ✅ | ✅ | -| [nlp_cloud](https://docs.litellm.ai/docs/providers/nlp_cloud) | ✅ | ✅ | ✅ | ✅ | -| [aleph alpha](https://docs.litellm.ai/docs/providers/aleph_alpha) | ✅ | ✅ | ✅ | ✅ | -| [petals](https://docs.litellm.ai/docs/providers/petals) | ✅ | ✅ | ✅ | ✅ | -| [ollama](https://docs.litellm.ai/docs/providers/ollama) | ✅ | ✅ | ✅ | ✅ | -| [deepinfra](https://docs.litellm.ai/docs/providers/deepinfra) | ✅ | ✅ | ✅ | ✅ | -| [perplexity-ai](https://docs.litellm.ai/docs/providers/perplexity) | ✅ | ✅ | ✅ | ✅ | -| [Groq AI](https://docs.litellm.ai/docs/providers/groq) | ✅ | ✅ | ✅ | ✅ | -| [anyscale](https://docs.litellm.ai/docs/providers/anyscale) | ✅ | ✅ | ✅ | ✅ | -| [voyage ai](https://docs.litellm.ai/docs/providers/voyage) | | | | | ✅ | -| [xinference [Xorbits Inference]](https://docs.litellm.ai/docs/providers/xinference) | | | | | ✅ | +| Provider | [Completion](https://docs.litellm.ai/docs/#basic-usage) | [Streaming](https://docs.litellm.ai/docs/completion/stream#streaming-responses) | [Async Completion](https://docs.litellm.ai/docs/completion/stream#async-completion) | [Async Streaming](https://docs.litellm.ai/docs/completion/stream#async-streaming) | [Async Embedding](https://docs.litellm.ai/docs/embedding/supported_embedding) | [Async Image Generation](https://docs.litellm.ai/docs/image_generation) | +| ----------------------------------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------- | --------------------------------------------------------------------------------- | ----------------------------------------------------------------------------- | ----------------------------------------------------------------------- | +| [openai](https://docs.litellm.ai/docs/providers/openai) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| [azure](https://docs.litellm.ai/docs/providers/azure) | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | +| [aws - sagemaker](https://docs.litellm.ai/docs/providers/aws_sagemaker) | ✅ | ✅ | ✅ | ✅ | ✅ | +| [aws - bedrock](https://docs.litellm.ai/docs/providers/bedrock) | ✅ | ✅ | ✅ | ✅ | ✅ | +| [google - vertex_ai [Gemini]](https://docs.litellm.ai/docs/providers/vertex) | ✅ | ✅ | ✅ | ✅ | +| [google - palm](https://docs.litellm.ai/docs/providers/palm) | ✅ | ✅ | ✅ | ✅ | +| [google AI Studio - gemini](https://docs.litellm.ai/docs/providers/gemini) | ✅ | | ✅ | | | +| [mistral ai api](https://docs.litellm.ai/docs/providers/mistral) | ✅ | ✅ | ✅ | ✅ | ✅ | +| [cloudflare AI Workers](https://docs.litellm.ai/docs/providers/cloudflare_workers) | ✅ | ✅ | ✅ | ✅ | +| [cohere](https://docs.litellm.ai/docs/providers/cohere) | ✅ | ✅ | ✅ | ✅ | ✅ | +| [anthropic](https://docs.litellm.ai/docs/providers/anthropic) | ✅ | ✅ | ✅ | ✅ | +| [huggingface](https://docs.litellm.ai/docs/providers/huggingface) | ✅ | ✅ | ✅ | ✅ | ✅ | +| [replicate](https://docs.litellm.ai/docs/providers/replicate) | ✅ | ✅ | ✅ | ✅ | +| [together_ai](https://docs.litellm.ai/docs/providers/togetherai) | ✅ | ✅ | ✅ | ✅ | +| [openrouter](https://docs.litellm.ai/docs/providers/openrouter) | ✅ | ✅ | ✅ | ✅ | +| [ai21](https://docs.litellm.ai/docs/providers/ai21) | ✅ | ✅ | ✅ | ✅ | +| [baseten](https://docs.litellm.ai/docs/providers/baseten) | ✅ | ✅ | ✅ | ✅ | +| [vllm](https://docs.litellm.ai/docs/providers/vllm) | ✅ | ✅ | ✅ | ✅ | +| [nlp_cloud](https://docs.litellm.ai/docs/providers/nlp_cloud) | ✅ | ✅ | ✅ | ✅ | +| [aleph alpha](https://docs.litellm.ai/docs/providers/aleph_alpha) | ✅ | ✅ | ✅ | ✅ | +| [petals](https://docs.litellm.ai/docs/providers/petals) | ✅ | ✅ | ✅ | ✅ | +| [ollama](https://docs.litellm.ai/docs/providers/ollama) | ✅ | ✅ | ✅ | ✅ | +| [deepinfra](https://docs.litellm.ai/docs/providers/deepinfra) | ✅ | ✅ | ✅ | ✅ | +| [perplexity-ai](https://docs.litellm.ai/docs/providers/perplexity) | ✅ | ✅ | ✅ | ✅ | +| [Groq AI](https://docs.litellm.ai/docs/providers/groq) | ✅ | ✅ | ✅ | ✅ | +| [anyscale](https://docs.litellm.ai/docs/providers/anyscale) | ✅ | ✅ | ✅ | ✅ | +| [voyage ai](https://docs.litellm.ai/docs/providers/voyage) | | | | | ✅ | +| [xinference [Xorbits Inference]](https://docs.litellm.ai/docs/providers/xinference) | | | | | ✅ | [**Read the Docs**](https://docs.litellm.ai/docs/) ## Contributing -To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change. -Here's how to modify the repo locally: -Step 1: Clone the repo +To contribute: Clone the repo locally -> Make a change -> Submit a PR with the change. + +Here's how to modify the repo locally: +Step 1: Clone the repo + ``` git clone https://github.com/BerriAI/litellm.git ``` -Step 2: Navigate into the project, and install dependencies: +Step 2: Navigate into the project, and install dependencies: + ``` cd litellm poetry install ``` Step 3: Test your change: + ``` cd litellm/tests # pwd: Documents/litellm/litellm/tests poetry run flake8 @@ -242,16 +255,19 @@ poetry run pytest . ``` Step 4: Submit a PR with your changes! 🚀 -- push your fork to your GitHub repo -- submit a PR from there + +- push your fork to your GitHub repo +- submit a PR from there # Support / talk with founders + - [Schedule Demo 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) - [Community Discord 💭](https://discord.gg/wuPM9dRgDw) - Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬ - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai -# Why did we build this +# Why did we build this + - **Need for simplicity**: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere. # Contributors @@ -268,4 +284,3 @@ Step 4: Submit a PR with your changes! 🚀 - diff --git a/cookbook/proxy-server/readme.md b/cookbook/proxy-server/readme.md index 4b296831b..d0b0592c4 100644 --- a/cookbook/proxy-server/readme.md +++ b/cookbook/proxy-server/readme.md @@ -33,7 +33,7 @@ - Call all models using the OpenAI format - `completion(model, messages)` - Text responses will always be available at `['choices'][0]['message']['content']` - **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`) -- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `LLMonitor`,`Athina`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ +- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Lunary`,`Athina`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ **Example: Logs sent to Supabase** Screenshot 2023-08-11 at 4 02 46 PM diff --git a/docs/my-website/docs/getting_started.md b/docs/my-website/docs/getting_started.md index 607b86943..edbdf3c00 100644 --- a/docs/my-website/docs/getting_started.md +++ b/docs/my-website/docs/getting_started.md @@ -2,11 +2,11 @@ import QuickStart from '../src/components/QuickStart.js' -LiteLLM simplifies LLM API calls by mapping them all to the [OpenAI ChatCompletion format](https://platform.openai.com/docs/api-reference/chat). +LiteLLM simplifies LLM API calls by mapping them all to the [OpenAI ChatCompletion format](https://platform.openai.com/docs/api-reference/chat). -## basic usage +## basic usage -By default we provide a free $10 community-key to try all providers supported on LiteLLM. +By default we provide a free $10 community-key to try all providers supported on LiteLLM. ```python from litellm import completion @@ -29,14 +29,16 @@ Email us @ krrish@berri.ai Next Steps 👉 [Call all supported models - e.g. Claude-2, Llama2-70b, etc.](./proxy_api.md#supported-models) -More details 👉 -* [Completion() function details](./completion/) -* [All supported models / providers on LiteLLM](./providers/) -* [Build your own OpenAI proxy](https://github.com/BerriAI/liteLLM-proxy/tree/main) +More details 👉 + +- [Completion() function details](./completion/) +- [All supported models / providers on LiteLLM](./providers/) +- [Build your own OpenAI proxy](https://github.com/BerriAI/liteLLM-proxy/tree/main) ## streaming -Same example from before. Just pass in `stream=True` in the completion args. +Same example from before. Just pass in `stream=True` in the completion args. + ```python from litellm import completion @@ -55,46 +57,50 @@ response = completion("command-nightly", messages, stream=True) print(response) ``` -More details 👉 -* [streaming + async](./completion/stream.md) -* [tutorial for streaming Llama2 on TogetherAI](./tutorials/TogetherAI_liteLLM.md) +More details 👉 -## exception handling +- [streaming + async](./completion/stream.md) +- [tutorial for streaming Llama2 on TogetherAI](./tutorials/TogetherAI_liteLLM.md) -LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. +## exception handling -```python +LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. + +```python from openai.error import OpenAIError from litellm import completion os.environ["ANTHROPIC_API_KEY"] = "bad-key" -try: - # some code +try: + # some code completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}]) except OpenAIError as e: print(e) ``` ## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks)) -LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack + +LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, Helicone, Promptlayer, Traceloop, Slack + ```python from litellm import completion ## set env variables for logging tools +os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" -os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" os.environ["OPENAI_API_KEY"] # set callbacks -litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase +litellm.success_callback = ["lunary", "langfuse"] # log input/output to langfuse, lunary, supabase #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) ``` -More details 👉 -* [exception mapping](./exception_mapping.md) -* [retries + model fallbacks for completion()](./completion/reliable_completions.md) -* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md) \ No newline at end of file +More details 👉 + +- [exception mapping](./exception_mapping.md) +- [retries + model fallbacks for completion()](./completion/reliable_completions.md) +- [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md) diff --git a/docs/my-website/docs/index.md b/docs/my-website/docs/index.md index d7ed14019..bfce80b4e 100644 --- a/docs/my-website/docs/index.md +++ b/docs/my-website/docs/index.md @@ -5,7 +5,6 @@ import TabItem from '@theme/TabItem'; https://github.com/BerriAI/litellm - ## **Call 100+ LLMs using the same Input/Output Format** - Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints @@ -13,7 +12,8 @@ https://github.com/BerriAI/litellm - Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing) - Track spend & set budgets per project [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy) -## Basic usage +## Basic usage + Open In Colab @@ -21,6 +21,7 @@ https://github.com/BerriAI/litellm ```shell pip install litellm ``` + @@ -32,7 +33,7 @@ import os os.environ["OPENAI_API_KEY"] = "your-api-key" response = completion( - model="gpt-3.5-turbo", + model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}] ) ``` @@ -48,7 +49,7 @@ import os os.environ["ANTHROPIC_API_KEY"] = "your-api-key" response = completion( - model="claude-2", + model="claude-2", messages=[{ "content": "Hello, how are you?","role": "user"}] ) ``` @@ -66,7 +67,7 @@ os.environ["VERTEX_PROJECT"] = "hardy-device-386718" os.environ["VERTEX_LOCATION"] = "us-central1" response = completion( - model="chat-bison", + model="chat-bison", messages=[{ "content": "Hello, how are you?","role": "user"}] ) ``` @@ -76,15 +77,15 @@ response = completion( ```python -from litellm import completion +from litellm import completion import os -os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" +os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints response = completion( model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0", - messages=[{ "content": "Hello, how are you?","role": "user"}], + messages=[{ "content": "Hello, how are you?","role": "user"}], api_base="https://my-endpoint.huggingface.cloud" ) @@ -106,25 +107,25 @@ os.environ["AZURE_API_VERSION"] = "" # azure call response = completion( - "azure/", + "azure/", messages = [{ "content": "Hello, how are you?","role": "user"}] ) ``` - ```python from litellm import completion response = completion( - model="ollama/llama2", - messages = [{ "content": "Hello, how are you?","role": "user"}], + model="ollama/llama2", + messages = [{ "content": "Hello, how are you?","role": "user"}], api_base="http://localhost:11434" ) ``` + @@ -133,19 +134,21 @@ from litellm import completion import os ## set ENV variables -os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" +os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" response = completion( - model="openrouter/google/palm-2-chat-bison", + model="openrouter/google/palm-2-chat-bison", messages = [{ "content": "Hello, how are you?","role": "user"}], ) ``` + ## Streaming -Set `stream=True` in the `completion` args. + +Set `stream=True` in the `completion` args. @@ -157,7 +160,7 @@ import os os.environ["OPENAI_API_KEY"] = "your-api-key" response = completion( - model="gpt-3.5-turbo", + model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -174,7 +177,7 @@ import os os.environ["ANTHROPIC_API_KEY"] = "your-api-key" response = completion( - model="claude-2", + model="claude-2", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -193,7 +196,7 @@ os.environ["VERTEX_PROJECT"] = "hardy-device-386718" os.environ["VERTEX_LOCATION"] = "us-central1" response = completion( - model="chat-bison", + model="chat-bison", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -204,15 +207,15 @@ response = completion( ```python -from litellm import completion +from litellm import completion import os -os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" +os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints response = completion( model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0", - messages=[{ "content": "Hello, how are you?","role": "user"}], + messages=[{ "content": "Hello, how are you?","role": "user"}], api_base="https://my-endpoint.huggingface.cloud", stream=True, ) @@ -235,7 +238,7 @@ os.environ["AZURE_API_VERSION"] = "" # azure call response = completion( - "azure/", + "azure/", messages = [{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -243,19 +246,19 @@ response = completion( - ```python from litellm import completion response = completion( - model="ollama/llama2", - messages = [{ "content": "Hello, how are you?","role": "user"}], + model="ollama/llama2", + messages = [{ "content": "Hello, how are you?","role": "user"}], api_base="http://localhost:11434", stream=True, ) ``` + @@ -264,60 +267,64 @@ from litellm import completion import os ## set ENV variables -os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" +os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" response = completion( - model="openrouter/google/palm-2-chat-bison", + model="openrouter/google/palm-2-chat-bison", messages = [{ "content": "Hello, how are you?","role": "user"}], stream=True, ) ``` + -## Exception handling +## Exception handling -LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. +LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. -```python +```python from openai.error import OpenAIError from litellm import completion os.environ["ANTHROPIC_API_KEY"] = "bad-key" -try: - # some code +try: + # some code completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}]) except OpenAIError as e: print(e) ``` ## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks)) -LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack + +LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, Helicone, Promptlayer, Traceloop, Slack + ```python from litellm import completion ## set env variables for logging tools os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" -os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" +os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" os.environ["OPENAI_API_KEY"] # set callbacks -litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase +litellm.success_callback = ["lunary", "langfuse"] # log input/output to lunary, langfuse, supabase #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) ``` ## Track Costs, Usage, Latency for streaming + Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback ```python import litellm -# track_cost_callback +# track_cost_callback def track_cost_callback( kwargs, # kwargs to completion completion_response, # response from completion @@ -328,7 +335,7 @@ def track_cost_callback( print("streaming response_cost", response_cost) except: pass -# set callback +# set callback litellm.success_callback = [track_cost_callback] # set custom callback function # litellm.completion() call @@ -346,11 +353,12 @@ response = completion( ## OpenAI Proxy -Track spend across multiple projects/people +Track spend across multiple projects/people ![ui_3](https://github.com/BerriAI/litellm/assets/29436595/47c97d5e-b9be-4839-b28c-43d7f4f10033) -The proxy provides: +The proxy provides: + 1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth) 2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class) 3. [Cost tracking](https://docs.litellm.ai/docs/proxy/virtual_keys#tracking-spend) @@ -358,13 +366,14 @@ The proxy provides: ### 📖 Proxy Endpoints - [Swagger Docs](https://litellm-api.up.railway.app/) -### Quick Start Proxy - CLI +### Quick Start Proxy - CLI ```shell pip install 'litellm[proxy]' ``` #### Step 1: Start litellm proxy + ```shell $ litellm --model huggingface/bigcode/starcoder @@ -372,6 +381,7 @@ $ litellm --model huggingface/bigcode/starcoder ``` #### Step 2: Make ChatCompletions Request to Proxy + ```python import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url @@ -387,6 +397,7 @@ print(response) ``` ## More details -* [exception mapping](./exception_mapping.md) -* [retries + model fallbacks for completion()](./completion/reliable_completions.md) -* [proxy virtual keys & spend management](./tutorials/fallbacks.md) \ No newline at end of file + +- [exception mapping](./exception_mapping.md) +- [retries + model fallbacks for completion()](./completion/reliable_completions.md) +- [proxy virtual keys & spend management](./tutorials/fallbacks.md) diff --git a/docs/my-website/docs/observability/callbacks.md b/docs/my-website/docs/observability/callbacks.md index 3b3d4eef3..fbc0733e5 100644 --- a/docs/my-website/docs/observability/callbacks.md +++ b/docs/my-website/docs/observability/callbacks.md @@ -7,7 +7,7 @@ liteLLM provides `input_callbacks`, `success_callbacks` and `failure_callbacks`, liteLLM supports: - [Custom Callback Functions](https://docs.litellm.ai/docs/observability/custom_callback) -- [LLMonitor](https://llmonitor.com/docs) +- [Lunary](https://lunary.ai/docs) - [Helicone](https://docs.helicone.ai/introduction) - [Traceloop](https://traceloop.com/docs) - [Athina](https://docs.athina.ai/) @@ -22,16 +22,16 @@ from litellm import completion # set callbacks litellm.input_callback=["sentry"] # for sentry breadcrumbing - logs the input being sent to the api -litellm.success_callback=["posthog", "helicone", "llmonitor", "athina"] -litellm.failure_callback=["sentry", "llmonitor"] +litellm.success_callback=["posthog", "helicone", "lunary", "athina"] +litellm.failure_callback=["sentry", "lunary"] ## set env variables os.environ['SENTRY_DSN'], os.environ['SENTRY_API_TRACE_RATE']= "" os.environ['POSTHOG_API_KEY'], os.environ['POSTHOG_API_URL'] = "api-key", "api-url" os.environ["HELICONE_API_KEY"] = "" os.environ["TRACELOOP_API_KEY"] = "" -os.environ["LLMONITOR_APP_ID"] = "" +os.environ["LUNARY_PUBLIC_KEY"] = "" os.environ["ATHINA_API_KEY"] = "" response = completion(model="gpt-3.5-turbo", messages=messages) -``` \ No newline at end of file +``` diff --git a/docs/my-website/docs/observability/llmonitor_integration.md b/docs/my-website/docs/observability/llmonitor_integration.md deleted file mode 100644 index 06ac44a84..000000000 --- a/docs/my-website/docs/observability/llmonitor_integration.md +++ /dev/null @@ -1,65 +0,0 @@ -# LLMonitor Tutorial - -[LLMonitor](https://llmonitor.com/) is an open-source observability platform that provides cost tracking, user tracking and powerful agent tracing. - - - -## Use LLMonitor to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM) - -liteLLM provides `callbacks`, making it easy for you to log data depending on the status of your responses. - -:::info -We want to learn how we can make the callbacks better! Meet the [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or -join our [discord](https://discord.gg/wuPM9dRgDw) -::: - -### Using Callbacks - -First, sign up to get an app ID on the [LLMonitor dashboard](https://llmonitor.com). - -Use just 2 lines of code, to instantly log your responses **across all providers** with llmonitor: - -```python -litellm.success_callback = ["llmonitor"] -litellm.failure_callback = ["llmonitor"] -``` - -Complete code - -```python -from litellm import completion - -## set env variables -os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" -# Optional: os.environ["LLMONITOR_API_URL"] = "self-hosting-url" - -os.environ["OPENAI_API_KEY"], os.environ["COHERE_API_KEY"] = "", "" - -# set callbacks -litellm.success_callback = ["llmonitor"] -litellm.failure_callback = ["llmonitor"] - -#openai call -response = completion( - model="gpt-3.5-turbo", - messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}], - user="ishaan_litellm" -) - -#cohere call -response = completion( - model="command-nightly", - messages=[{"role": "user", "content": "Hi 👋 - i'm cohere"}], - user="ishaan_litellm" -) -``` - -## Support & Talk to Founders - -- [Schedule Demo 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) -- [Community Discord 💭](https://discord.gg/wuPM9dRgDw) -- Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬ -- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai -- Meet the LLMonitor team on [Discord](http://discord.com/invite/8PafSG58kK) or via [email](mailto:vince@llmonitor.com). \ No newline at end of file diff --git a/docs/my-website/docs/observability/lunary_integration.md b/docs/my-website/docs/observability/lunary_integration.md new file mode 100644 index 000000000..9b8e90df7 --- /dev/null +++ b/docs/my-website/docs/observability/lunary_integration.md @@ -0,0 +1,82 @@ +# Lunary - Logging and tracing LLM input/output + +[Lunary](https://lunary.ai/) is an open-source AI developer platform providing observability, prompt management, and evaluation tools for AI developers. + + + +## Use Lunary to log requests across all LLM Providers (OpenAI, Azure, Anthropic, Cohere, Replicate, PaLM) + +liteLLM provides `callbacks`, making it easy for you to log data depending on the status of your responses. + +:::info +We want to learn how we can make the callbacks better! Meet the [founders](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) or +join our [discord](https://discord.gg/wuPM9dRgDw) +::: + +### Using Callbacks + +First, sign up to get a public key on the [Lunary dashboard](https://lunary.ai). + +Use just 2 lines of code, to instantly log your responses **across all providers** with lunary: + +```python +litellm.success_callback = ["lunary"] +litellm.failure_callback = ["lunary"] +``` + +Complete code + +```python +from litellm import completion + +## set env variables +os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" + +os.environ["OPENAI_API_KEY"] = "" + +# set callbacks +litellm.success_callback = ["lunary"] +litellm.failure_callback = ["lunary"] + +#openai call +response = completion( + model="gpt-3.5-turbo", + messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}], + user="ishaan_litellm" +) +``` + +## Templates + +You can use Lunary to manage prompt templates and use them across all your LLM providers. + +Make sure to have `lunary` installed: + +```bash +pip install lunary +``` + +Then, use the following code to pull templates into Lunary: + +```python +from litellm import completion +from lunary + +template = lunary.render_template("template-slug", { + "name": "John", # Inject variables +}) + +litellm.success_callback = ["lunary"] + +result = completion(**template) +``` + +## Support & Talk to Founders + +- Meet the Lunary team via [email](mailto:hello@lunary.ai). +- [Schedule Demo 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) +- [Community Discord 💭](https://discord.gg/wuPM9dRgDw) +- Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬ +- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js index 7a24723af..555461ec3 100644 --- a/docs/my-website/sidebars.js +++ b/docs/my-website/sidebars.js @@ -22,18 +22,18 @@ const sidebars = { type: "category", label: "💥 OpenAI Proxy Server", link: { - type: 'generated-index', - title: '💥 OpenAI Proxy Server', + type: "generated-index", + title: "💥 OpenAI Proxy Server", description: `Proxy Server to call 100+ LLMs in a unified interface & track spend, set budgets per virtual key/user`, - slug: '/simple_proxy', + slug: "/simple_proxy", }, items: [ - "proxy/quick_start", + "proxy/quick_start", "proxy/configs", { - type: 'link', - label: '📖 All Endpoints', - href: 'https://litellm-api.up.railway.app/', + type: "link", + label: "📖 All Endpoints", + href: "https://litellm-api.up.railway.app/", }, "proxy/enterprise", "proxy/user_keys", @@ -45,53 +45,43 @@ const sidebars = { "proxy/debugging", "proxy/pii_masking", { - "type": "category", - "label": "🔥 Load Balancing", - "items": [ - "proxy/load_balancing", - "proxy/reliability", - ] + type: "category", + label: "🔥 Load Balancing", + items: ["proxy/load_balancing", "proxy/reliability"], }, "proxy/caching", { - "type": "category", - "label": "Logging, Alerting", - "items": [ - "proxy/logging", - "proxy/alerting", - "proxy/streaming_logging", - ] + type: "category", + label: "Logging, Alerting", + items: ["proxy/logging", "proxy/alerting", "proxy/streaming_logging"], }, { - "type": "category", - "label": "Content Moderation", - "items": [ - "proxy/call_hooks", - "proxy/rules", - ] + type: "category", + label: "Content Moderation", + items: ["proxy/call_hooks", "proxy/rules"], }, - "proxy/deploy", - "proxy/cli", - ] + "proxy/deploy", + "proxy/cli", + ], }, { type: "category", label: "Completion()", link: { - type: 'generated-index', - title: 'Completion()', - description: 'Details on the completion() function', - slug: '/completion', + type: "generated-index", + title: "Completion()", + description: "Details on the completion() function", + slug: "/completion", }, items: [ - "completion/input", + "completion/input", "completion/prompt_formatting", - "completion/output", + "completion/output", "exception_mapping", - "completion/stream", + "completion/stream", "completion/message_trimming", "completion/function_call", - "completion/model_alias", + "completion/model_alias", "completion/batching", "completion/mock_requests", "completion/reliable_completions", @@ -101,54 +91,55 @@ const sidebars = { type: "category", label: "Embedding(), Moderation(), Image Generation()", items: [ - "embedding/supported_embedding", + "embedding/supported_embedding", "embedding/async_embedding", "embedding/moderation", - "image_generation" + "image_generation", ], }, { type: "category", label: "Supported Models & Providers", link: { - type: 'generated-index', - title: 'Providers', - description: 'Learn how to deploy + call models from different providers on LiteLLM', - slug: '/providers', + type: "generated-index", + title: "Providers", + description: + "Learn how to deploy + call models from different providers on LiteLLM", + slug: "/providers", }, items: [ - "providers/openai", + "providers/openai", "providers/openai_compatible", - "providers/azure", - "providers/azure_ai", - "providers/huggingface", - "providers/ollama", - "providers/vertex", - "providers/palm", - "providers/gemini", - "providers/mistral", - "providers/anthropic", + "providers/azure", + "providers/azure_ai", + "providers/huggingface", + "providers/ollama", + "providers/vertex", + "providers/palm", + "providers/gemini", + "providers/mistral", + "providers/anthropic", "providers/aws_sagemaker", - "providers/bedrock", + "providers/bedrock", "providers/anyscale", - "providers/perplexity", - "providers/groq", - "providers/vllm", - "providers/xinference", - "providers/cloudflare_workers", + "providers/perplexity", + "providers/groq", + "providers/vllm", + "providers/xinference", + "providers/cloudflare_workers", "providers/deepinfra", - "providers/ai21", + "providers/ai21", "providers/nlp_cloud", - "providers/replicate", - "providers/cohere", - "providers/togetherai", - "providers/voyage", - "providers/aleph_alpha", - "providers/baseten", - "providers/openrouter", + "providers/replicate", + "providers/cohere", + "providers/togetherai", + "providers/voyage", + "providers/aleph_alpha", + "providers/baseten", + "providers/openrouter", "providers/custom_openai_proxy", "providers/petals", - ] + ], }, "proxy/custom_pricing", "routing", @@ -163,9 +154,10 @@ const sidebars = { type: "category", label: "Logging & Observability", items: [ - 'debugging/local_debugging', + "debugging/local_debugging", "observability/callbacks", "observability/custom_callback", + "observability/lunary_integration", "observability/langfuse_integration", "observability/sentry", "observability/promptlayer_integration", @@ -174,7 +166,6 @@ const sidebars = { "observability/slack_integration", "observability/traceloop_integration", "observability/athina_integration", - "observability/llmonitor_integration", "observability/helicone_integration", "observability/supabase_integration", `observability/telemetry`, @@ -182,18 +173,18 @@ const sidebars = { }, "caching/redis_cache", { - type: 'category', - label: 'Tutorials', + type: "category", + label: "Tutorials", items: [ - 'tutorials/azure_openai', - 'tutorials/oobabooga', + "tutorials/azure_openai", + "tutorials/oobabooga", "tutorials/gradio_integration", - 'tutorials/huggingface_codellama', - 'tutorials/huggingface_tutorial', - 'tutorials/TogetherAI_liteLLM', - 'tutorials/finetuned_chat_gpt', - 'tutorials/sagemaker_llms', - 'tutorials/text_completion', + "tutorials/huggingface_codellama", + "tutorials/huggingface_tutorial", + "tutorials/TogetherAI_liteLLM", + "tutorials/finetuned_chat_gpt", + "tutorials/sagemaker_llms", + "tutorials/text_completion", "tutorials/first_playground", "tutorials/model_fallbacks", ], @@ -201,40 +192,39 @@ const sidebars = { { type: "category", label: "LangChain, LlamaIndex Integration", - items: [ - "langchain/langchain" - ], + items: ["langchain/langchain"], }, { - type: 'category', - label: 'Extras', + type: "category", + label: "Extras", items: [ - 'extras/contributing', + "extras/contributing", "proxy_server", { type: "category", label: "❤️ 🚅 Projects built on LiteLLM", link: { - type: 'generated-index', - title: 'Projects built on LiteLLM', - description: 'Learn how to deploy + call models from different providers on LiteLLM', - slug: '/project', + type: "generated-index", + title: "Projects built on LiteLLM", + description: + "Learn how to deploy + call models from different providers on LiteLLM", + slug: "/project", }, items: [ "projects/Docq.AI", "projects/OpenInterpreter", "projects/FastREPL", "projects/PROMPTMETHEUS", - "projects/Codium PR Agent", + "projects/Codium PR Agent", "projects/Prompt2Model", "projects/SalesGPT", - "projects/Quivr", - "projects/Langstream", - "projects/Otter", - "projects/GPT Migrate", - "projects/YiVal", - "projects/LiteLLM Proxy", - ] + "projects/Quivr", + "projects/Langstream", + "projects/Otter", + "projects/GPT Migrate", + "projects/YiVal", + "projects/LiteLLM Proxy", + ], }, ], }, diff --git a/docs/my-website/src/pages/index.md b/docs/my-website/src/pages/index.md index d7ed14019..126e83688 100644 --- a/docs/my-website/src/pages/index.md +++ b/docs/my-website/src/pages/index.md @@ -5,7 +5,6 @@ import TabItem from '@theme/TabItem'; https://github.com/BerriAI/litellm - ## **Call 100+ LLMs using the same Input/Output Format** - Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints @@ -13,7 +12,8 @@ https://github.com/BerriAI/litellm - Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing) - Track spend & set budgets per project [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy) -## Basic usage +## Basic usage + Open In Colab @@ -21,6 +21,7 @@ https://github.com/BerriAI/litellm ```shell pip install litellm ``` + @@ -32,7 +33,7 @@ import os os.environ["OPENAI_API_KEY"] = "your-api-key" response = completion( - model="gpt-3.5-turbo", + model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}] ) ``` @@ -48,7 +49,7 @@ import os os.environ["ANTHROPIC_API_KEY"] = "your-api-key" response = completion( - model="claude-2", + model="claude-2", messages=[{ "content": "Hello, how are you?","role": "user"}] ) ``` @@ -66,7 +67,7 @@ os.environ["VERTEX_PROJECT"] = "hardy-device-386718" os.environ["VERTEX_LOCATION"] = "us-central1" response = completion( - model="chat-bison", + model="chat-bison", messages=[{ "content": "Hello, how are you?","role": "user"}] ) ``` @@ -76,15 +77,15 @@ response = completion( ```python -from litellm import completion +from litellm import completion import os -os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" +os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints response = completion( model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0", - messages=[{ "content": "Hello, how are you?","role": "user"}], + messages=[{ "content": "Hello, how are you?","role": "user"}], api_base="https://my-endpoint.huggingface.cloud" ) @@ -106,25 +107,25 @@ os.environ["AZURE_API_VERSION"] = "" # azure call response = completion( - "azure/", + "azure/", messages = [{ "content": "Hello, how are you?","role": "user"}] ) ``` - ```python from litellm import completion response = completion( - model="ollama/llama2", - messages = [{ "content": "Hello, how are you?","role": "user"}], + model="ollama/llama2", + messages = [{ "content": "Hello, how are you?","role": "user"}], api_base="http://localhost:11434" ) ``` + @@ -133,19 +134,21 @@ from litellm import completion import os ## set ENV variables -os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" +os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" response = completion( - model="openrouter/google/palm-2-chat-bison", + model="openrouter/google/palm-2-chat-bison", messages = [{ "content": "Hello, how are you?","role": "user"}], ) ``` + ## Streaming -Set `stream=True` in the `completion` args. + +Set `stream=True` in the `completion` args. @@ -157,7 +160,7 @@ import os os.environ["OPENAI_API_KEY"] = "your-api-key" response = completion( - model="gpt-3.5-turbo", + model="gpt-3.5-turbo", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -174,7 +177,7 @@ import os os.environ["ANTHROPIC_API_KEY"] = "your-api-key" response = completion( - model="claude-2", + model="claude-2", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -193,7 +196,7 @@ os.environ["VERTEX_PROJECT"] = "hardy-device-386718" os.environ["VERTEX_LOCATION"] = "us-central1" response = completion( - model="chat-bison", + model="chat-bison", messages=[{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -204,15 +207,15 @@ response = completion( ```python -from litellm import completion +from litellm import completion import os -os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" +os.environ["HUGGINGFACE_API_KEY"] = "huggingface_api_key" # e.g. Call 'WizardLM/WizardCoder-Python-34B-V1.0' hosted on HF Inference endpoints response = completion( model="huggingface/WizardLM/WizardCoder-Python-34B-V1.0", - messages=[{ "content": "Hello, how are you?","role": "user"}], + messages=[{ "content": "Hello, how are you?","role": "user"}], api_base="https://my-endpoint.huggingface.cloud", stream=True, ) @@ -235,7 +238,7 @@ os.environ["AZURE_API_VERSION"] = "" # azure call response = completion( - "azure/", + "azure/", messages = [{ "content": "Hello, how are you?","role": "user"}], stream=True, ) @@ -243,19 +246,19 @@ response = completion( - ```python from litellm import completion response = completion( - model="ollama/llama2", - messages = [{ "content": "Hello, how are you?","role": "user"}], + model="ollama/llama2", + messages = [{ "content": "Hello, how are you?","role": "user"}], api_base="http://localhost:11434", stream=True, ) ``` + @@ -264,60 +267,64 @@ from litellm import completion import os ## set ENV variables -os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" +os.environ["OPENROUTER_API_KEY"] = "openrouter_api_key" response = completion( - model="openrouter/google/palm-2-chat-bison", + model="openrouter/google/palm-2-chat-bison", messages = [{ "content": "Hello, how are you?","role": "user"}], stream=True, ) ``` + -## Exception handling +## Exception handling -LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. +LiteLLM maps exceptions across all supported providers to the OpenAI exceptions. All our exceptions inherit from OpenAI's exception types, so any error-handling you have for that, should work out of the box with LiteLLM. -```python +```python from openai.error import OpenAIError from litellm import completion os.environ["ANTHROPIC_API_KEY"] = "bad-key" -try: - # some code +try: + # some code completion(model="claude-instant-1", messages=[{"role": "user", "content": "Hey, how's it going?"}]) except OpenAIError as e: print(e) ``` ## Logging Observability - Log LLM Input/Output ([Docs](https://docs.litellm.ai/docs/observability/callbacks)) -LiteLLM exposes pre defined callbacks to send data to Langfuse, LLMonitor, Helicone, Promptlayer, Traceloop, Slack + +LiteLLM exposes pre defined callbacks to send data to Lunary, Langfuse, Helicone, Promptlayer, Traceloop, Slack + ```python from litellm import completion ## set env variables for logging tools os.environ["LANGFUSE_PUBLIC_KEY"] = "" os.environ["LANGFUSE_SECRET_KEY"] = "" -os.environ["LLMONITOR_APP_ID"] = "your-llmonitor-app-id" +os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key" os.environ["OPENAI_API_KEY"] # set callbacks -litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langfuse, llmonitor, supabase +litellm.success_callback = ["langfuse", "lunary"] # log input/output to lunary, langfuse, supabase #openai call response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) ``` ## Track Costs, Usage, Latency for streaming + Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback ```python import litellm -# track_cost_callback +# track_cost_callback def track_cost_callback( kwargs, # kwargs to completion completion_response, # response from completion @@ -328,7 +335,7 @@ def track_cost_callback( print("streaming response_cost", response_cost) except: pass -# set callback +# set callback litellm.success_callback = [track_cost_callback] # set custom callback function # litellm.completion() call @@ -346,11 +353,12 @@ response = completion( ## OpenAI Proxy -Track spend across multiple projects/people +Track spend across multiple projects/people ![ui_3](https://github.com/BerriAI/litellm/assets/29436595/47c97d5e-b9be-4839-b28c-43d7f4f10033) -The proxy provides: +The proxy provides: + 1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth) 2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class) 3. [Cost tracking](https://docs.litellm.ai/docs/proxy/virtual_keys#tracking-spend) @@ -358,13 +366,14 @@ The proxy provides: ### 📖 Proxy Endpoints - [Swagger Docs](https://litellm-api.up.railway.app/) -### Quick Start Proxy - CLI +### Quick Start Proxy - CLI ```shell pip install 'litellm[proxy]' ``` #### Step 1: Start litellm proxy + ```shell $ litellm --model huggingface/bigcode/starcoder @@ -372,6 +381,7 @@ $ litellm --model huggingface/bigcode/starcoder ``` #### Step 2: Make ChatCompletions Request to Proxy + ```python import openai # openai v1.0.0+ client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:8000") # set proxy to base_url @@ -387,6 +397,7 @@ print(response) ``` ## More details -* [exception mapping](./exception_mapping.md) -* [retries + model fallbacks for completion()](./completion/reliable_completions.md) -* [proxy virtual keys & spend management](./tutorials/fallbacks.md) \ No newline at end of file + +- [exception mapping](./exception_mapping.md) +- [retries + model fallbacks for completion()](./completion/reliable_completions.md) +- [proxy virtual keys & spend management](./tutorials/fallbacks.md) diff --git a/docs/my-website/src/pages/observability/callbacks.md b/docs/my-website/src/pages/observability/callbacks.md index be27d76da..2ec288d5e 100644 --- a/docs/my-website/src/pages/observability/callbacks.md +++ b/docs/my-website/src/pages/observability/callbacks.md @@ -6,7 +6,7 @@ liteLLM provides `success_callbacks` and `failure_callbacks`, making it easy for liteLLM supports: -- [LLMonitor](https://llmonitor.com/docs) +- [Lunary](https://lunary.ai/docs) - [Helicone](https://docs.helicone.ai/introduction) - [Sentry](https://docs.sentry.io/platforms/python/) - [PostHog](https://posthog.com/docs/libraries/python) @@ -18,8 +18,8 @@ liteLLM supports: from litellm import completion # set callbacks -litellm.success_callback=["posthog", "helicone", "llmonitor"] -litellm.failure_callback=["sentry", "llmonitor"] +litellm.success_callback=["posthog", "helicone", "lunary"] +litellm.failure_callback=["sentry", "lunary"] ## set env variables os.environ['SENTRY_DSN'], os.environ['SENTRY_API_TRACE_RATE']= "" diff --git a/litellm/integrations/llmonitor.py b/litellm/integrations/llmonitor.py deleted file mode 100644 index ff4c3990f..000000000 --- a/litellm/integrations/llmonitor.py +++ /dev/null @@ -1,127 +0,0 @@ -#### What this does #### -# On success + failure, log events to aispend.io -import datetime -import traceback -import dotenv -import os -import requests - -dotenv.load_dotenv() # Loading env variables using dotenv - - -# convert to {completion: xx, tokens: xx} -def parse_usage(usage): - return { - "completion": usage["completion_tokens"] if "completion_tokens" in usage else 0, - "prompt": usage["prompt_tokens"] if "prompt_tokens" in usage else 0, - } - - -def parse_messages(input): - if input is None: - return None - - def clean_message(message): - # if is strin, return as is - if isinstance(message, str): - return message - - if "message" in message: - return clean_message(message["message"]) - text = message["content"] - if text == None: - text = message.get("function_call", None) - - return { - "role": message["role"], - "text": text, - } - - if isinstance(input, list): - if len(input) == 1: - return clean_message(input[0]) - else: - return [clean_message(msg) for msg in input] - else: - return clean_message(input) - - -class LLMonitorLogger: - # Class variables or attributes - def __init__(self): - # Instance variables - self.api_url = os.getenv("LLMONITOR_API_URL") or "https://app.llmonitor.com" - self.app_id = os.getenv("LLMONITOR_APP_ID") - - def log_event( - self, - type, - event, - run_id, - model, - print_verbose, - input=None, - user_id=None, - response_obj=None, - start_time=datetime.datetime.now(), - end_time=datetime.datetime.now(), - error=None, - ): - # Method definition - try: - print_verbose(f"LLMonitor Logging - Logging request for model {model}") - - if response_obj: - usage = ( - parse_usage(response_obj["usage"]) - if "usage" in response_obj - else None - ) - output = response_obj["choices"] if "choices" in response_obj else None - else: - usage = None - output = None - - if error: - error_obj = {"stack": error} - - else: - error_obj = None - - data = [ - { - "type": type, - "name": model, - "runId": run_id, - "app": self.app_id, - "event": "start", - "timestamp": start_time.isoformat(), - "userId": user_id, - "input": parse_messages(input), - }, - { - "type": type, - "runId": run_id, - "app": self.app_id, - "event": event, - "error": error_obj, - "timestamp": end_time.isoformat(), - "userId": user_id, - "output": parse_messages(output), - "tokensUsage": usage, - }, - ] - - print_verbose(f"LLMonitor Logging - final data object: {data}") - - response = requests.post( - self.api_url + "/api/report", - headers={"Content-Type": "application/json"}, - json={"events": data}, - ) - - print_verbose(f"LLMonitor Logging - response: {response}") - except: - # traceback.print_exc() - print_verbose(f"LLMonitor Logging Error - {traceback.format_exc()}") - pass diff --git a/litellm/integrations/lunary.py b/litellm/integrations/lunary.py new file mode 100644 index 000000000..fca8927d6 --- /dev/null +++ b/litellm/integrations/lunary.py @@ -0,0 +1,167 @@ +#### What this does #### +# On success + failure, log events to aispend.io +import datetime +import traceback +import dotenv +import os +import requests + +dotenv.load_dotenv() # Loading env variables using dotenv +import traceback +import datetime, subprocess, sys +import litellm + +# convert to {completion: xx, tokens: xx} +def parse_usage(usage): + return { + "completion": usage["completion_tokens"] if "completion_tokens" in usage else 0, + "prompt": usage["prompt_tokens"] if "prompt_tokens" in usage else 0, + } + + +def parse_messages(input): + if input is None: + return None + + def clean_message(message): + # if is strin, return as is + if isinstance(message, str): + return message + + if "message" in message: + return clean_message(message["message"]) + + + serialized = { + "role": message.get("role"), + "content": message.get("content"), + } + + # Only add tool_calls and function_call to res if they are set + if message.get("tool_calls"): + serialized["tool_calls"] = message.get("tool_calls") + if message.get("function_call"): + serialized["function_call"] = message.get("function_call") + + return serialized + + if isinstance(input, list): + if len(input) == 1: + return clean_message(input[0]) + else: + return [clean_message(msg) for msg in input] + else: + return clean_message(input) + + +class LunaryLogger: + # Class variables or attributes + def __init__(self): + try: + import lunary + # lunary.__version__ doesn't exist throws if lunary is not installed + if not hasattr(lunary, "track_event"): + raise ImportError + + self.lunary_client = lunary + except ImportError: + print("Lunary not installed. Installing now...") + subprocess.check_call([sys.executable, "-m", "pip", "install", "lunary", "--upgrade"]) + import importlib + import lunary + importlib.reload(lunary) + + self.lunary_client = lunary + + + def log_event( + self, + kwargs, + type, + event, + run_id, + model, + print_verbose, + extra=None, + input=None, + user_id=None, + response_obj=None, + start_time=datetime.datetime.now(), + end_time=datetime.datetime.now(), + error=None, + ): + # Method definition + try: + + print_verbose(f"Lunary Logging - Logging request for model {model}") + + litellm_params = kwargs.get("litellm_params", {}) + metadata = ( + litellm_params.get("metadata", {}) or {} + ) + + tags = litellm_params.pop("tags", None) or [] + + template_id = extra.pop("templateId", None), + + for param, value in extra.items(): + if not isinstance(value, (str, int, bool, float)): + try: + extra[param] = str(value) + except: + # if casting value to str fails don't block logging + pass + + if response_obj: + usage = ( + parse_usage(response_obj["usage"]) + if "usage" in response_obj + else None + ) + output = response_obj["choices"] if "choices" in response_obj else None + else: + usage = None + output = None + + if error: + error_obj = {"stack": error} + else: + error_obj = None + + print(start_time.isoformat()) + + self.lunary_client.track_event( + type, + "start", + run_id, + user_id=user_id, + name=model, + input=parse_messages(input), + timestamp=start_time.isoformat(), + # template_id=template_id, + metadata=metadata, + runtime="litellm", + tags=tags, + extra=extra, + # user_props=user_props, + ) + + self.lunary_client.track_event( + type, + event, + run_id, + timestamp=end_time.isoformat(), + runtime="litellm", + error=error_obj, + output=parse_messages(output), + token_usage={ + "prompt": usage.get("prompt_tokens"), + "completion": usage.get("completion_tokens"), + } + ) + + + except: + # traceback.print_exc() + print_verbose(f"Lunary Logging Error - {traceback.format_exc()}") + pass diff --git a/litellm/tests/test_llmonitor_integration.py b/litellm/tests/test_llmonitor_integration.py deleted file mode 100644 index e88995f3b..000000000 --- a/litellm/tests/test_llmonitor_integration.py +++ /dev/null @@ -1,76 +0,0 @@ -# #### What this tests #### -# # This tests if logging to the llmonitor integration actually works -# # Adds the parent directory to the system path -# import sys -# import os - -# sys.path.insert(0, os.path.abspath("../..")) - -# from litellm import completion, embedding -# import litellm - -# litellm.success_callback = ["llmonitor"] -# litellm.failure_callback = ["llmonitor"] - -# litellm.set_verbose = True - - -# def test_chat_openai(): -# try: -# response = completion( -# model="gpt-3.5-turbo", -# messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}], -# user="ishaan_from_litellm" -# ) - -# print(response) - -# except Exception as e: -# print(e) - - -# def test_embedding_openai(): -# try: -# response = embedding(model="text-embedding-ada-002", input=["test"]) -# # Add any assertions here to check the response -# print(f"response: {str(response)[:50]}") -# except Exception as e: -# print(e) - - -# test_chat_openai() -# # test_embedding_openai() - - -# def test_llmonitor_logging_function_calling(): -# function1 = [ -# { -# "name": "get_current_weather", -# "description": "Get the current weather in a given location", -# "parameters": { -# "type": "object", -# "properties": { -# "location": { -# "type": "string", -# "description": "The city and state, e.g. San Francisco, CA", -# }, -# "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}, -# }, -# "required": ["location"], -# }, -# } -# ] -# try: -# response = completion(model="gpt-3.5-turbo", -# messages=[{ -# "role": "user", -# "content": "what's the weather in boston" -# }], -# temperature=0.1, -# functions=function1, -# ) -# print(response) -# except Exception as e: -# print(e) - -# # test_llmonitor_logging_function_calling() diff --git a/litellm/tests/test_lunary.py b/litellm/tests/test_lunary.py new file mode 100644 index 000000000..34b4ef8df --- /dev/null +++ b/litellm/tests/test_lunary.py @@ -0,0 +1,70 @@ +import sys +import os +import io + +sys.path.insert(0, os.path.abspath("../..")) + +from litellm import completion +import litellm + +litellm.success_callback = ["lunary"] +litellm.set_verbose = True +import time + + +def test_lunary_logging(): + try: + response = completion( + model="gpt-3.5-turbo", + messages=[{"role": "user", "content": "what llm are u"}], + max_tokens=10, + temperature=0.2, + ) + print(response) + except Exception as e: + print(e) + + +test_lunary_logging() + + +def test_lunary_logging_with_metadata(): + try: + response = completion( + model="gpt-3.5-turbo", + messages=[{"role": "user", "content": "what llm are u"}], + max_tokens=10, + temperature=0.2, + metadata={ + "run_name": "litellmRUN", + "project_name": "litellm-completion", + }, + ) + print(response) + except Exception as e: + print(e) + + +# test_lunary_logging_with_metadata() + + +def test_lunary_logging_with_streaming_and_metadata(): + try: + response = completion( + model="gpt-3.5-turbo", + messages=[{"role": "user", "content": "what llm are u"}], + max_tokens=10, + temperature=0.2, + metadata={ + "run_name": "litellmRUN", + "project_name": "litellm-completion", + }, + stream=True, + ) + for chunk in response: + continue + except Exception as e: + print(e) + + +# test_lunary_logging_with_streaming_and_metadata() diff --git a/litellm/utils.py b/litellm/utils.py index 0ef50f20c..8c731d065 100644 --- a/litellm/utils.py +++ b/litellm/utils.py @@ -60,7 +60,7 @@ from .integrations.helicone import HeliconeLogger from .integrations.aispend import AISpendLogger from .integrations.berrispend import BerriSpendLogger from .integrations.supabase import Supabase -from .integrations.llmonitor import LLMonitorLogger +from .integrations.lunary import LunaryLogger from .integrations.prompt_layer import PromptLayerLogger from .integrations.langsmith import LangsmithLogger from .integrations.weights_biases import WeightsBiasesLogger @@ -126,7 +126,7 @@ dynamoLogger = None s3Logger = None genericAPILogger = None clickHouseLogger = None -llmonitorLogger = None +lunaryLogger = None aispendLogger = None berrispendLogger = None supabaseClient = None @@ -788,7 +788,7 @@ class CallTypes(Enum): # Logging function -> log the exact model details + what's being sent | Non-BlockingP class Logging: - global supabaseClient, liteDebuggerClient, promptLayerLogger, weightsBiasesLogger, langsmithLogger, capture_exception, add_breadcrumb, llmonitorLogger + global supabaseClient, liteDebuggerClient, promptLayerLogger, weightsBiasesLogger, langsmithLogger, capture_exception, add_breadcrumb, lunaryLogger def __init__( self, @@ -1327,27 +1327,28 @@ class Logging: end_time=end_time, print_verbose=print_verbose, ) - if callback == "llmonitor": - print_verbose("reaches llmonitor for logging!") + if callback == "lunary": + print_verbose("reaches lunary for logging!") model = self.model input = self.model_call_details.get( "messages", self.model_call_details.get("input", None) ) - # if contains input, it's 'embedding', otherwise 'llm' type = ( "embed" if self.call_type == CallTypes.embedding.value else "llm" ) - llmonitorLogger.log_event( + lunaryLogger.log_event( type=type, + kwargs=self.model_call_details, event="end", model=model, input=input, user_id=self.model_call_details.get("user", "default"), + extra=self.model_call_details.get("optional_params", {}), response_obj=result, start_time=start_time, end_time=end_time, @@ -1842,8 +1843,8 @@ class Logging: call_type=self.call_type, stream=self.stream, ) - elif callback == "llmonitor": - print_verbose("reaches llmonitor for logging error!") + elif callback == "lunary": + print_verbose("reaches lunary for logging error!") model = self.model @@ -1855,7 +1856,7 @@ class Logging: else "llm" ) - llmonitorLogger.log_event( + lunaryLogger.log_event( type=_type, event="error", user_id=self.model_call_details.get("user", "default"), @@ -5593,7 +5594,7 @@ def validate_environment(model: Optional[str] = None) -> dict: def set_callbacks(callback_list, function_id=None): - global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, traceloopLogger, athinaLogger, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger, promptLayerLogger, langFuseLogger, customLogger, weightsBiasesLogger, langsmithLogger, dynamoLogger, s3Logger + global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, traceloopLogger, athinaLogger, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, lunaryLogger, promptLayerLogger, langFuseLogger, customLogger, weightsBiasesLogger, langsmithLogger, dynamoLogger, s3Logger try: for callback in callback_list: print_verbose(f"callback: {callback}") @@ -5653,8 +5654,8 @@ def set_callbacks(callback_list, function_id=None): print_verbose("Initialized Athina Logger") elif callback == "helicone": heliconeLogger = HeliconeLogger() - elif callback == "llmonitor": - llmonitorLogger = LLMonitorLogger() + elif callback == "lunary": + lunaryLogger = LunaryLogger() elif callback == "promptlayer": promptLayerLogger = PromptLayerLogger() elif callback == "langfuse": @@ -5692,7 +5693,7 @@ def set_callbacks(callback_list, function_id=None): # NOTE: DEPRECATING this in favor of using failure_handler() in Logging: def handle_failure(exception, traceback_exception, start_time, end_time, args, kwargs): - global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger + global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, lunaryLogger try: # print_verbose(f"handle_failure args: {args}") # print_verbose(f"handle_failure kwargs: {kwargs}")