forked from phoenix/litellm-mirror
almost working llmonitor
This commit is contained in:
parent
22c7e38de5
commit
3675d3e029
5 changed files with 425 additions and 326 deletions
|
@ -1,6 +1,7 @@
|
||||||
|
|
||||||
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
|
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
|
||||||
|
|
||||||
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
|
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
|
||||||
|
|
||||||
[](https://pypi.org/project/litellm/)
|
[](https://pypi.org/project/litellm/)
|
||||||
[](https://pypi.org/project/litellm/0.1.1/)
|
[](https://pypi.org/project/litellm/0.1.1/)
|
||||||

|

|
||||||
|
@ -11,34 +12,36 @@
|
||||||

|

|
||||||
|
|
||||||
## What does liteLLM proxy do
|
## What does liteLLM proxy do
|
||||||
|
|
||||||
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
|
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
|
||||||
|
|
||||||
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
|
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"content": "Hello, whats the weather in San Francisco??",
|
"content": "Hello, whats the weather in San Francisco??",
|
||||||
"role": "user"
|
"role": "user"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
- **Consistent Input/Output** Format
|
|
||||||
- Call all models using the OpenAI format - `completion(model, messages)`
|
|
||||||
- Text responses will always be available at `['choices'][0]['message']['content']`
|
|
||||||
- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
|
|
||||||
- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
|
|
||||||
|
|
||||||
**Example: Logs sent to Supabase**
|
- **Consistent Input/Output** Format
|
||||||
|
- Call all models using the OpenAI format - `completion(model, messages)`
|
||||||
|
- Text responses will always be available at `['choices'][0]['message']['content']`
|
||||||
|
- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
|
||||||
|
- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `LLMonitor,` `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
|
||||||
|
|
||||||
|
**Example: Logs sent to Supabase**
|
||||||
<img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
|
<img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
|
||||||
|
|
||||||
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
|
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
|
||||||
- **Caching** - Implementation of Semantic Caching
|
- **Caching** - Implementation of Semantic Caching
|
||||||
- **Streaming & Async Support** - Return generators to stream text responses
|
- **Streaming & Async Support** - Return generators to stream text responses
|
||||||
|
|
||||||
|
|
||||||
## API Endpoints
|
## API Endpoints
|
||||||
|
|
||||||
### `/chat/completions` (POST)
|
### `/chat/completions` (POST)
|
||||||
|
@ -46,34 +49,37 @@
|
||||||
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
|
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
|
||||||
|
|
||||||
#### Input
|
#### Input
|
||||||
|
|
||||||
This API endpoint accepts all inputs in raw JSON and expects the following inputs
|
This API endpoint accepts all inputs in raw JSON and expects the following inputs
|
||||||
- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
|
|
||||||
eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
|
||||||
|
eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
||||||
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
|
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
|
||||||
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
|
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
|
||||||
|
|
||||||
|
|
||||||
#### Example JSON body
|
#### Example JSON body
|
||||||
|
|
||||||
For claude-2
|
For claude-2
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "claude-2",
|
"model": "claude-2",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"content": "Hello, whats the weather in San Francisco??",
|
"content": "Hello, whats the weather in San Francisco??",
|
||||||
"role": "user"
|
"role": "user"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Making an API request to the Proxy Server
|
### Making an API request to the Proxy Server
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import requests
|
import requests
|
||||||
import json
|
import json
|
||||||
|
|
||||||
# TODO: use your URL
|
# TODO: use your URL
|
||||||
url = "http://localhost:5000/chat/completions"
|
url = "http://localhost:5000/chat/completions"
|
||||||
|
|
||||||
payload = json.dumps({
|
payload = json.dumps({
|
||||||
|
@ -94,34 +100,38 @@ print(response.text)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Output [Response Format]
|
### Output [Response Format]
|
||||||
Responses from the server are given in the following format.
|
|
||||||
|
Responses from the server are given in the following format.
|
||||||
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
|
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"choices": [
|
"choices": [
|
||||||
{
|
{
|
||||||
"finish_reason": "stop",
|
"finish_reason": "stop",
|
||||||
"index": 0,
|
"index": 0,
|
||||||
"message": {
|
"message": {
|
||||||
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
|
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
|
||||||
"role": "assistant"
|
"role": "assistant"
|
||||||
}
|
}
|
||||||
}
|
|
||||||
],
|
|
||||||
"created": 1691790381,
|
|
||||||
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
|
|
||||||
"model": "gpt-3.5-turbo-0613",
|
|
||||||
"object": "chat.completion",
|
|
||||||
"usage": {
|
|
||||||
"completion_tokens": 41,
|
|
||||||
"prompt_tokens": 16,
|
|
||||||
"total_tokens": 57
|
|
||||||
}
|
}
|
||||||
|
],
|
||||||
|
"created": 1691790381,
|
||||||
|
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
|
||||||
|
"model": "gpt-3.5-turbo-0613",
|
||||||
|
"object": "chat.completion",
|
||||||
|
"usage": {
|
||||||
|
"completion_tokens": 41,
|
||||||
|
"prompt_tokens": 16,
|
||||||
|
"total_tokens": 57
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Installation & Usage
|
## Installation & Usage
|
||||||
|
|
||||||
### Running Locally
|
### Running Locally
|
||||||
|
|
||||||
1. Clone liteLLM repository to your local machine:
|
1. Clone liteLLM repository to your local machine:
|
||||||
```
|
```
|
||||||
git clone https://github.com/BerriAI/liteLLM-proxy
|
git clone https://github.com/BerriAI/liteLLM-proxy
|
||||||
|
@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM
|
||||||
python main.py
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Deploying
|
## Deploying
|
||||||
|
|
||||||
1. Quick Start: Deploy on Railway
|
1. Quick Start: Deploy on Railway
|
||||||
|
|
||||||
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
|
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
|
||||||
|
|
||||||
2. `GCP`, `AWS`, `Azure`
|
2. `GCP`, `AWS`, `Azure`
|
||||||
This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
|
This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
|
||||||
|
|
||||||
# Support / Talk with founders
|
# Support / Talk with founders
|
||||||
|
|
||||||
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
||||||
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
|
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
|
||||||
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
|
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
|
||||||
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
|
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
|
||||||
|
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
- [ ] Support hosted db (e.g. Supabase)
|
- [ ] Support hosted db (e.g. Supabase)
|
||||||
- [ ] Easily send data to places like posthog and sentry.
|
- [ ] Easily send data to places like posthog and sentry.
|
||||||
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
|
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
|
||||||
|
|
|
@ -5,6 +5,7 @@ import traceback
|
||||||
import dotenv
|
import dotenv
|
||||||
import os
|
import os
|
||||||
import requests
|
import requests
|
||||||
|
|
||||||
dotenv.load_dotenv() # Loading env variables using dotenv
|
dotenv.load_dotenv() # Loading env variables using dotenv
|
||||||
|
|
||||||
|
|
||||||
|
@ -14,45 +15,34 @@ class LLMonitorLogger:
|
||||||
# Instance variables
|
# Instance variables
|
||||||
self.api_url = os.getenv(
|
self.api_url = os.getenv(
|
||||||
"LLMONITOR_API_URL") or "https://app.llmonitor.com"
|
"LLMONITOR_API_URL") or "https://app.llmonitor.com"
|
||||||
self.account_id = os.getenv("LLMONITOR_APP_ID")
|
self.app_id = os.getenv("LLMONITOR_APP_ID")
|
||||||
|
|
||||||
def log_event(self, model, messages, response_obj, start_time, end_time, print_verbose):
|
def log_event(self, type, run_id, error, usage, model, messages,
|
||||||
|
response_obj, user_id, time, print_verbose):
|
||||||
# Method definition
|
# Method definition
|
||||||
try:
|
try:
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"LLMonitor Logging - Enters logging function for model {model}")
|
f"LLMonitor Logging - Enters logging function for model {model}"
|
||||||
|
)
|
||||||
|
|
||||||
print(model, messages, response_obj, start_time, end_time)
|
print(type, model, messages, response_obj, time, end_user)
|
||||||
|
|
||||||
# headers = {
|
headers = {'Content-Type': 'application/json'}
|
||||||
# 'Content-Type': 'application/json'
|
|
||||||
# }
|
|
||||||
|
|
||||||
# prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = self.price_calculator(
|
data = {
|
||||||
# model, response_obj, start_time, end_time)
|
"type": "llm",
|
||||||
# total_cost = prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
|
"name": model,
|
||||||
|
"runId": run_id,
|
||||||
|
"app": self.app_id,
|
||||||
|
"error": error,
|
||||||
|
"event": type,
|
||||||
|
"timestamp": time.isoformat(),
|
||||||
|
"userId": user_id,
|
||||||
|
"input": messages,
|
||||||
|
"output": response_obj['choices'][0]['message']['content'],
|
||||||
|
}
|
||||||
|
|
||||||
# response_time = (end_time-start_time).total_seconds()
|
print_verbose(f"LLMonitor Logging - final data object: {data}")
|
||||||
# if "response" in response_obj:
|
|
||||||
# data = [{
|
|
||||||
# "response_time": response_time,
|
|
||||||
# "model_id": response_obj["model"],
|
|
||||||
# "total_cost": total_cost,
|
|
||||||
# "messages": messages,
|
|
||||||
# "response": response_obj['choices'][0]['message']['content'],
|
|
||||||
# "account_id": self.account_id
|
|
||||||
# }]
|
|
||||||
# elif "error" in response_obj:
|
|
||||||
# data = [{
|
|
||||||
# "response_time": response_time,
|
|
||||||
# "model_id": response_obj["model"],
|
|
||||||
# "total_cost": total_cost,
|
|
||||||
# "messages": messages,
|
|
||||||
# "error": response_obj['error'],
|
|
||||||
# "account_id": self.account_id
|
|
||||||
# }]
|
|
||||||
|
|
||||||
# print_verbose(f"BerriSpend Logging - final data object: {data}")
|
|
||||||
# response = requests.post(url, headers=headers, json=data)
|
# response = requests.post(url, headers=headers, json=data)
|
||||||
except:
|
except:
|
||||||
# traceback.print_exc()
|
# traceback.print_exc()
|
||||||
|
|
|
@ -1,28 +1,36 @@
|
||||||
#### What this tests ####
|
#### What this tests ####
|
||||||
# This tests if logging to the helicone integration actually works
|
# This tests if logging to the llmonitor integration actually works
|
||||||
|
# Adds the parent directory to the system path
|
||||||
from litellm import embedding, completion
|
|
||||||
import litellm
|
|
||||||
import sys
|
import sys
|
||||||
import os
|
import os
|
||||||
import traceback
|
|
||||||
import pytest
|
|
||||||
|
|
||||||
# Adds the parent directory to the system path
|
|
||||||
sys.path.insert(0, os.path.abspath('../..'))
|
sys.path.insert(0, os.path.abspath('../..'))
|
||||||
|
|
||||||
|
from litellm import completion
|
||||||
|
import litellm
|
||||||
|
|
||||||
|
litellm.input_callback = ["llmonitor"]
|
||||||
litellm.success_callback = ["llmonitor"]
|
litellm.success_callback = ["llmonitor"]
|
||||||
|
litellm.error_callback = ["llmonitor"]
|
||||||
|
|
||||||
litellm.set_verbose = True
|
litellm.set_verbose = True
|
||||||
|
|
||||||
user_message = "Hello, how are you?"
|
|
||||||
messages = [{"content": user_message, "role": "user"}]
|
|
||||||
|
|
||||||
|
|
||||||
# openai call
|
# openai call
|
||||||
response = completion(model="gpt-3.5-turbo",
|
# response = completion(model="gpt-3.5-turbo",
|
||||||
messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
|
# messages=[{
|
||||||
|
# "role": "user",
|
||||||
|
# "content": "Hi 👋 - i'm openai"
|
||||||
|
# }])
|
||||||
|
|
||||||
|
# print(response)
|
||||||
|
|
||||||
|
# #bad request call
|
||||||
|
# response = completion(model="chatgpt-test", messages=[{"role": "user", "content": "Hi 👋 - i'm a bad request"}])
|
||||||
|
|
||||||
# cohere call
|
# cohere call
|
||||||
# response = completion(model="command-nightly",
|
response = completion(model="command-nightly",
|
||||||
# messages=[{"role": "user", "content": "Hi 👋 - i'm cohere"}])
|
messages=[{
|
||||||
|
"role": "user",
|
||||||
|
"content": "Hi 👋 - i'm cohere"
|
||||||
|
}])
|
||||||
|
print(response)
|
||||||
|
|
437
litellm/utils.py
437
litellm/utils.py
|
@ -1,20 +1,7 @@
|
||||||
import sys
|
import aiohttp
|
||||||
import dotenv, json, traceback, threading
|
import subprocess
|
||||||
import subprocess, os
|
import importlib
|
||||||
import litellm, openai
|
from typing import List, Dict, Union, Optional
|
||||||
import random, uuid, requests
|
|
||||||
import datetime, time
|
|
||||||
import tiktoken
|
|
||||||
|
|
||||||
encoding = tiktoken.get_encoding("cl100k_base")
|
|
||||||
import pkg_resources
|
|
||||||
from .integrations.helicone import HeliconeLogger
|
|
||||||
from .integrations.aispend import AISpendLogger
|
|
||||||
from .integrations.berrispend import BerriSpendLogger
|
|
||||||
from .integrations.supabase import Supabase
|
|
||||||
from .integrations.litedebugger import LiteDebugger
|
|
||||||
from openai.error import OpenAIError as OriginalError
|
|
||||||
from openai.openai_object import OpenAIObject
|
|
||||||
from .exceptions import (
|
from .exceptions import (
|
||||||
AuthenticationError,
|
AuthenticationError,
|
||||||
InvalidRequestError,
|
InvalidRequestError,
|
||||||
|
@ -22,7 +9,32 @@ from .exceptions import (
|
||||||
ServiceUnavailableError,
|
ServiceUnavailableError,
|
||||||
OpenAIError,
|
OpenAIError,
|
||||||
)
|
)
|
||||||
from typing import List, Dict, Union, Optional
|
from openai.openai_object import OpenAIObject
|
||||||
|
from openai.error import OpenAIError as OriginalError
|
||||||
|
from .integrations.llmonitor import LLMonitorLogger
|
||||||
|
from .integrations.litedebugger import LiteDebugger
|
||||||
|
from .integrations.supabase import Supabase
|
||||||
|
from .integrations.berrispend import BerriSpendLogger
|
||||||
|
from .integrations.aispend import AISpendLogger
|
||||||
|
from .integrations.helicone import HeliconeLogger
|
||||||
|
import pkg_resources
|
||||||
|
import sys
|
||||||
|
import dotenv
|
||||||
|
import json
|
||||||
|
import traceback
|
||||||
|
import threading
|
||||||
|
import subprocess
|
||||||
|
import os
|
||||||
|
import litellm
|
||||||
|
import openai
|
||||||
|
import random
|
||||||
|
import uuid
|
||||||
|
import requests
|
||||||
|
import datetime
|
||||||
|
import time
|
||||||
|
import tiktoken
|
||||||
|
|
||||||
|
encoding = tiktoken.get_encoding("cl100k_base")
|
||||||
|
|
||||||
####### ENVIRONMENT VARIABLES ###################
|
####### ENVIRONMENT VARIABLES ###################
|
||||||
dotenv.load_dotenv() # Loading env variables using dotenv
|
dotenv.load_dotenv() # Loading env variables using dotenv
|
||||||
|
@ -37,6 +49,7 @@ aispendLogger = None
|
||||||
berrispendLogger = None
|
berrispendLogger = None
|
||||||
supabaseClient = None
|
supabaseClient = None
|
||||||
liteDebuggerClient = None
|
liteDebuggerClient = None
|
||||||
|
llmonitorLogger = None
|
||||||
callback_list: Optional[List[str]] = []
|
callback_list: Optional[List[str]] = []
|
||||||
user_logger_fn = None
|
user_logger_fn = None
|
||||||
additional_details: Optional[Dict[str, str]] = {}
|
additional_details: Optional[Dict[str, str]] = {}
|
||||||
|
@ -63,6 +76,7 @@ local_cache: Optional[Dict[str, str]] = {}
|
||||||
|
|
||||||
|
|
||||||
class Message(OpenAIObject):
|
class Message(OpenAIObject):
|
||||||
|
|
||||||
def __init__(self, content="default", role="assistant", **params):
|
def __init__(self, content="default", role="assistant", **params):
|
||||||
super(Message, self).__init__(**params)
|
super(Message, self).__init__(**params)
|
||||||
self.content = content
|
self.content = content
|
||||||
|
@ -70,7 +84,12 @@ class Message(OpenAIObject):
|
||||||
|
|
||||||
|
|
||||||
class Choices(OpenAIObject):
|
class Choices(OpenAIObject):
|
||||||
def __init__(self, finish_reason="stop", index=0, message=Message(), **params):
|
|
||||||
|
def __init__(self,
|
||||||
|
finish_reason="stop",
|
||||||
|
index=0,
|
||||||
|
message=Message(),
|
||||||
|
**params):
|
||||||
super(Choices, self).__init__(**params)
|
super(Choices, self).__init__(**params)
|
||||||
self.finish_reason = finish_reason
|
self.finish_reason = finish_reason
|
||||||
self.index = index
|
self.index = index
|
||||||
|
@ -78,20 +97,22 @@ class Choices(OpenAIObject):
|
||||||
|
|
||||||
|
|
||||||
class ModelResponse(OpenAIObject):
|
class ModelResponse(OpenAIObject):
|
||||||
def __init__(self, choices=None, created=None, model=None, usage=None, **params):
|
|
||||||
|
def __init__(self,
|
||||||
|
choices=None,
|
||||||
|
created=None,
|
||||||
|
model=None,
|
||||||
|
usage=None,
|
||||||
|
**params):
|
||||||
super(ModelResponse, self).__init__(**params)
|
super(ModelResponse, self).__init__(**params)
|
||||||
self.choices = choices if choices else [Choices()]
|
self.choices = choices if choices else [Choices()]
|
||||||
self.created = created
|
self.created = created
|
||||||
self.model = model
|
self.model = model
|
||||||
self.usage = (
|
self.usage = (usage if usage else {
|
||||||
usage
|
"prompt_tokens": None,
|
||||||
if usage
|
"completion_tokens": None,
|
||||||
else {
|
"total_tokens": None,
|
||||||
"prompt_tokens": None,
|
})
|
||||||
"completion_tokens": None,
|
|
||||||
"total_tokens": None,
|
|
||||||
}
|
|
||||||
)
|
|
||||||
|
|
||||||
def to_dict_recursive(self):
|
def to_dict_recursive(self):
|
||||||
d = super().to_dict_recursive()
|
d = super().to_dict_recursive()
|
||||||
|
@ -108,8 +129,6 @@ def print_verbose(print_statement):
|
||||||
|
|
||||||
|
|
||||||
####### Package Import Handler ###################
|
####### Package Import Handler ###################
|
||||||
import importlib
|
|
||||||
import subprocess
|
|
||||||
|
|
||||||
|
|
||||||
def install_and_import(package: str):
|
def install_and_import(package: str):
|
||||||
|
@ -139,6 +158,7 @@ def install_and_import(package: str):
|
||||||
# Logging function -> log the exact model details + what's being sent | Non-Blocking
|
# Logging function -> log the exact model details + what's being sent | Non-Blocking
|
||||||
class Logging:
|
class Logging:
|
||||||
global supabaseClient, liteDebuggerClient
|
global supabaseClient, liteDebuggerClient
|
||||||
|
|
||||||
def __init__(self, model, messages, optional_params, litellm_params):
|
def __init__(self, model, messages, optional_params, litellm_params):
|
||||||
self.model = model
|
self.model = model
|
||||||
self.messages = messages
|
self.messages = messages
|
||||||
|
@ -146,20 +166,20 @@ class Logging:
|
||||||
self.litellm_params = litellm_params
|
self.litellm_params = litellm_params
|
||||||
self.logger_fn = litellm_params["logger_fn"]
|
self.logger_fn = litellm_params["logger_fn"]
|
||||||
self.model_call_details = {
|
self.model_call_details = {
|
||||||
"model": model,
|
"model": model,
|
||||||
"messages": messages,
|
"messages": messages,
|
||||||
"optional_params": self.optional_params,
|
"optional_params": self.optional_params,
|
||||||
"litellm_params": self.litellm_params,
|
"litellm_params": self.litellm_params,
|
||||||
}
|
}
|
||||||
|
|
||||||
def pre_call(self, input, api_key, additional_args={}):
|
def pre_call(self, input, api_key, additional_args={}):
|
||||||
try:
|
try:
|
||||||
print(f"logging pre call for model: {self.model}")
|
print(f"logging pre call for model: {self.model}")
|
||||||
self.model_call_details["input"] = input
|
self.model_call_details["input"] = input
|
||||||
self.model_call_details["api_key"] = api_key
|
self.model_call_details["api_key"] = api_key
|
||||||
self.model_call_details["additional_args"] = additional_args
|
self.model_call_details["additional_args"] = additional_args
|
||||||
|
|
||||||
## User Logging -> if you pass in a custom logging function
|
# User Logging -> if you pass in a custom logging function
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
|
f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
|
||||||
)
|
)
|
||||||
|
@ -173,7 +193,7 @@ class Logging:
|
||||||
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
|
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
|
||||||
)
|
)
|
||||||
|
|
||||||
## Input Integration Logging -> If you want to log the fact that an attempt to call the model was made
|
# Input Integration Logging -> If you want to log the fact that an attempt to call the model was made
|
||||||
for callback in litellm.input_callback:
|
for callback in litellm.input_callback:
|
||||||
try:
|
try:
|
||||||
if callback == "supabase":
|
if callback == "supabase":
|
||||||
|
@ -185,7 +205,21 @@ class Logging:
|
||||||
model=model,
|
model=model,
|
||||||
messages=messages,
|
messages=messages,
|
||||||
end_user=litellm._thread_context.user,
|
end_user=litellm._thread_context.user,
|
||||||
litellm_call_id=self.litellm_params["litellm_call_id"],
|
litellm_call_id=self.
|
||||||
|
litellm_params["litellm_call_id"],
|
||||||
|
print_verbose=print_verbose,
|
||||||
|
)
|
||||||
|
elif callback == "llmonitor":
|
||||||
|
print_verbose("reaches llmonitor for logging!")
|
||||||
|
model = self.model
|
||||||
|
messages = self.messages
|
||||||
|
print(f"liteDebuggerClient: {liteDebuggerClient}")
|
||||||
|
llmonitorLogger.log_event(
|
||||||
|
type="start",
|
||||||
|
model=model,
|
||||||
|
messages=messages,
|
||||||
|
user_id=litellm._thread_context.user,
|
||||||
|
run_id=self.litellm_params["litellm_call_id"],
|
||||||
print_verbose=print_verbose,
|
print_verbose=print_verbose,
|
||||||
)
|
)
|
||||||
elif callback == "lite_debugger":
|
elif callback == "lite_debugger":
|
||||||
|
@ -197,15 +231,18 @@ class Logging:
|
||||||
model=model,
|
model=model,
|
||||||
messages=messages,
|
messages=messages,
|
||||||
end_user=litellm._thread_context.user,
|
end_user=litellm._thread_context.user,
|
||||||
litellm_call_id=self.litellm_params["litellm_call_id"],
|
litellm_call_id=self.
|
||||||
|
litellm_params["litellm_call_id"],
|
||||||
print_verbose=print_verbose,
|
print_verbose=print_verbose,
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
print_verbose(f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}")
|
print_verbose(
|
||||||
|
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}"
|
||||||
|
)
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
|
f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
|
||||||
)
|
)
|
||||||
if capture_exception: # log this error to sentry for debugging
|
if capture_exception: # log this error to sentry for debugging
|
||||||
capture_exception(e)
|
capture_exception(e)
|
||||||
except:
|
except:
|
||||||
print_verbose(
|
print_verbose(
|
||||||
|
@ -214,9 +251,9 @@ class Logging:
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
|
f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
|
||||||
)
|
)
|
||||||
if capture_exception: # log this error to sentry for debugging
|
if capture_exception: # log this error to sentry for debugging
|
||||||
capture_exception(e)
|
capture_exception(e)
|
||||||
|
|
||||||
def post_call(self, input, api_key, original_response, additional_args={}):
|
def post_call(self, input, api_key, original_response, additional_args={}):
|
||||||
# Do something here
|
# Do something here
|
||||||
try:
|
try:
|
||||||
|
@ -224,8 +261,8 @@ class Logging:
|
||||||
self.model_call_details["api_key"] = api_key
|
self.model_call_details["api_key"] = api_key
|
||||||
self.model_call_details["original_response"] = original_response
|
self.model_call_details["original_response"] = original_response
|
||||||
self.model_call_details["additional_args"] = additional_args
|
self.model_call_details["additional_args"] = additional_args
|
||||||
|
|
||||||
## User Logging -> if you pass in a custom logging function
|
# User Logging -> if you pass in a custom logging function
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
|
f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
|
||||||
)
|
)
|
||||||
|
@ -243,9 +280,9 @@ class Logging:
|
||||||
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
|
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
|
||||||
)
|
)
|
||||||
pass
|
pass
|
||||||
|
|
||||||
# Add more methods as needed
|
# Add more methods as needed
|
||||||
|
|
||||||
|
|
||||||
def exception_logging(
|
def exception_logging(
|
||||||
additional_args={},
|
additional_args={},
|
||||||
|
@ -257,7 +294,7 @@ def exception_logging(
|
||||||
if exception:
|
if exception:
|
||||||
model_call_details["exception"] = exception
|
model_call_details["exception"] = exception
|
||||||
model_call_details["additional_args"] = additional_args
|
model_call_details["additional_args"] = additional_args
|
||||||
## User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs
|
# User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"Logging Details: logger_fn - {logger_fn} | callable(logger_fn) - {callable(logger_fn)}"
|
f"Logging Details: logger_fn - {logger_fn} | callable(logger_fn) - {callable(logger_fn)}"
|
||||||
)
|
)
|
||||||
|
@ -280,20 +317,20 @@ def exception_logging(
|
||||||
####### CLIENT ###################
|
####### CLIENT ###################
|
||||||
# make it easy to log if completion/embedding runs succeeded or failed + see what happened | Non-Blocking
|
# make it easy to log if completion/embedding runs succeeded or failed + see what happened | Non-Blocking
|
||||||
def client(original_function):
|
def client(original_function):
|
||||||
|
|
||||||
def function_setup(
|
def function_setup(
|
||||||
*args, **kwargs
|
*args, **kwargs
|
||||||
): # just run once to check if user wants to send their data anywhere - PostHog/Sentry/Slack/etc.
|
): # just run once to check if user wants to send their data anywhere - PostHog/Sentry/Slack/etc.
|
||||||
try:
|
try:
|
||||||
global callback_list, add_breadcrumb, user_logger_fn
|
global callback_list, add_breadcrumb, user_logger_fn
|
||||||
if (
|
if (len(litellm.input_callback) > 0
|
||||||
len(litellm.input_callback) > 0 or len(litellm.success_callback) > 0 or len(litellm.failure_callback) > 0
|
or len(litellm.success_callback) > 0
|
||||||
) and len(callback_list) == 0:
|
or len(litellm.failure_callback)
|
||||||
|
> 0) and len(callback_list) == 0:
|
||||||
callback_list = list(
|
callback_list = list(
|
||||||
set(litellm.input_callback + litellm.success_callback + litellm.failure_callback)
|
set(litellm.input_callback + litellm.success_callback +
|
||||||
)
|
litellm.failure_callback))
|
||||||
set_callbacks(
|
set_callbacks(callback_list=callback_list, )
|
||||||
callback_list=callback_list,
|
|
||||||
)
|
|
||||||
if add_breadcrumb:
|
if add_breadcrumb:
|
||||||
add_breadcrumb(
|
add_breadcrumb(
|
||||||
category="litellm.llm_call",
|
category="litellm.llm_call",
|
||||||
|
@ -310,12 +347,11 @@ def client(original_function):
|
||||||
if litellm.telemetry:
|
if litellm.telemetry:
|
||||||
try:
|
try:
|
||||||
model = args[0] if len(args) > 0 else kwargs["model"]
|
model = args[0] if len(args) > 0 else kwargs["model"]
|
||||||
exception = kwargs["exception"] if "exception" in kwargs else None
|
exception = kwargs[
|
||||||
custom_llm_provider = (
|
"exception"] if "exception" in kwargs else None
|
||||||
kwargs["custom_llm_provider"]
|
custom_llm_provider = (kwargs["custom_llm_provider"]
|
||||||
if "custom_llm_provider" in kwargs
|
if "custom_llm_provider" in kwargs else
|
||||||
else None
|
None)
|
||||||
)
|
|
||||||
safe_crash_reporting(
|
safe_crash_reporting(
|
||||||
model=model,
|
model=model,
|
||||||
exception=exception,
|
exception=exception,
|
||||||
|
@ -340,15 +376,12 @@ def client(original_function):
|
||||||
def check_cache(*args, **kwargs):
|
def check_cache(*args, **kwargs):
|
||||||
try: # never block execution
|
try: # never block execution
|
||||||
prompt = get_prompt(*args, **kwargs)
|
prompt = get_prompt(*args, **kwargs)
|
||||||
if (
|
if (prompt != None and prompt
|
||||||
prompt != None and prompt in local_cache
|
in local_cache): # check if messages / prompt exists
|
||||||
): # check if messages / prompt exists
|
|
||||||
if litellm.caching_with_models:
|
if litellm.caching_with_models:
|
||||||
# if caching with model names is enabled, key is prompt + model name
|
# if caching with model names is enabled, key is prompt + model name
|
||||||
if (
|
if ("model" in kwargs and kwargs["model"]
|
||||||
"model" in kwargs
|
in local_cache[prompt]["models"]):
|
||||||
and kwargs["model"] in local_cache[prompt]["models"]
|
|
||||||
):
|
|
||||||
cache_key = prompt + kwargs["model"]
|
cache_key = prompt + kwargs["model"]
|
||||||
return local_cache[cache_key]
|
return local_cache[cache_key]
|
||||||
else: # caching only with prompts
|
else: # caching only with prompts
|
||||||
|
@ -363,10 +396,8 @@ def client(original_function):
|
||||||
try: # never block execution
|
try: # never block execution
|
||||||
prompt = get_prompt(*args, **kwargs)
|
prompt = get_prompt(*args, **kwargs)
|
||||||
if litellm.caching_with_models: # caching with model + prompt
|
if litellm.caching_with_models: # caching with model + prompt
|
||||||
if (
|
if ("model" in kwargs
|
||||||
"model" in kwargs
|
and kwargs["model"] in local_cache[prompt]["models"]):
|
||||||
and kwargs["model"] in local_cache[prompt]["models"]
|
|
||||||
):
|
|
||||||
cache_key = prompt + kwargs["model"]
|
cache_key = prompt + kwargs["model"]
|
||||||
local_cache[cache_key] = result
|
local_cache[cache_key] = result
|
||||||
else: # caching based only on prompts
|
else: # caching based only on prompts
|
||||||
|
@ -381,24 +412,24 @@ def client(original_function):
|
||||||
function_setup(*args, **kwargs)
|
function_setup(*args, **kwargs)
|
||||||
litellm_call_id = str(uuid.uuid4())
|
litellm_call_id = str(uuid.uuid4())
|
||||||
kwargs["litellm_call_id"] = litellm_call_id
|
kwargs["litellm_call_id"] = litellm_call_id
|
||||||
## [OPTIONAL] CHECK CACHE
|
# [OPTIONAL] CHECK CACHE
|
||||||
start_time = datetime.datetime.now()
|
start_time = datetime.datetime.now()
|
||||||
if (litellm.caching or litellm.caching_with_models) and (
|
if (litellm.caching or litellm.caching_with_models) and (
|
||||||
cached_result := check_cache(*args, **kwargs)
|
cached_result := check_cache(*args, **kwargs)) is not None:
|
||||||
) is not None:
|
|
||||||
result = cached_result
|
result = cached_result
|
||||||
else:
|
else:
|
||||||
## MODEL CALL
|
# MODEL CALL
|
||||||
result = original_function(*args, **kwargs)
|
result = original_function(*args, **kwargs)
|
||||||
end_time = datetime.datetime.now()
|
end_time = datetime.datetime.now()
|
||||||
## Add response to CACHE
|
# Add response to CACHE
|
||||||
if litellm.caching:
|
if litellm.caching:
|
||||||
add_cache(result, *args, **kwargs)
|
add_cache(result, *args, **kwargs)
|
||||||
## LOG SUCCESS
|
# LOG SUCCESS
|
||||||
crash_reporting(*args, **kwargs)
|
crash_reporting(*args, **kwargs)
|
||||||
my_thread = threading.Thread(
|
my_thread = threading.Thread(
|
||||||
target=handle_success, args=(args, kwargs, result, start_time, end_time)
|
target=handle_success,
|
||||||
) # don't interrupt execution of main thread
|
args=(args, kwargs, result, start_time,
|
||||||
|
end_time)) # don't interrupt execution of main thread
|
||||||
my_thread.start()
|
my_thread.start()
|
||||||
return result
|
return result
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
@ -407,7 +438,8 @@ def client(original_function):
|
||||||
end_time = datetime.datetime.now()
|
end_time = datetime.datetime.now()
|
||||||
my_thread = threading.Thread(
|
my_thread = threading.Thread(
|
||||||
target=handle_failure,
|
target=handle_failure,
|
||||||
args=(e, traceback_exception, start_time, end_time, args, kwargs),
|
args=(e, traceback_exception, start_time, end_time, args,
|
||||||
|
kwargs),
|
||||||
) # don't interrupt execution of main thread
|
) # don't interrupt execution of main thread
|
||||||
my_thread.start()
|
my_thread.start()
|
||||||
raise e
|
raise e
|
||||||
|
@ -432,18 +464,18 @@ def token_counter(model, text):
|
||||||
return num_tokens
|
return num_tokens
|
||||||
|
|
||||||
|
|
||||||
def cost_per_token(model="gpt-3.5-turbo", prompt_tokens=0, completion_tokens=0):
|
def cost_per_token(model="gpt-3.5-turbo",
|
||||||
## given
|
prompt_tokens=0,
|
||||||
|
completion_tokens=0):
|
||||||
|
# given
|
||||||
prompt_tokens_cost_usd_dollar = 0
|
prompt_tokens_cost_usd_dollar = 0
|
||||||
completion_tokens_cost_usd_dollar = 0
|
completion_tokens_cost_usd_dollar = 0
|
||||||
model_cost_ref = litellm.model_cost
|
model_cost_ref = litellm.model_cost
|
||||||
if model in model_cost_ref:
|
if model in model_cost_ref:
|
||||||
prompt_tokens_cost_usd_dollar = (
|
prompt_tokens_cost_usd_dollar = (
|
||||||
model_cost_ref[model]["input_cost_per_token"] * prompt_tokens
|
model_cost_ref[model]["input_cost_per_token"] * prompt_tokens)
|
||||||
)
|
|
||||||
completion_tokens_cost_usd_dollar = (
|
completion_tokens_cost_usd_dollar = (
|
||||||
model_cost_ref[model]["output_cost_per_token"] * completion_tokens
|
model_cost_ref[model]["output_cost_per_token"] * completion_tokens)
|
||||||
)
|
|
||||||
return prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar
|
return prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar
|
||||||
else:
|
else:
|
||||||
# calculate average input cost
|
# calculate average input cost
|
||||||
|
@ -464,8 +496,9 @@ def completion_cost(model="gpt-3.5-turbo", prompt="", completion=""):
|
||||||
prompt_tokens = token_counter(model=model, text=prompt)
|
prompt_tokens = token_counter(model=model, text=prompt)
|
||||||
completion_tokens = token_counter(model=model, text=completion)
|
completion_tokens = token_counter(model=model, text=completion)
|
||||||
prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token(
|
prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token(
|
||||||
model=model, prompt_tokens=prompt_tokens, completion_tokens=completion_tokens
|
model=model,
|
||||||
)
|
prompt_tokens=prompt_tokens,
|
||||||
|
completion_tokens=completion_tokens)
|
||||||
return prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
|
return prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
|
||||||
|
|
||||||
|
|
||||||
|
@ -557,9 +590,8 @@ def get_optional_params(
|
||||||
optional_params["max_tokens"] = max_tokens
|
optional_params["max_tokens"] = max_tokens
|
||||||
if frequency_penalty != 0:
|
if frequency_penalty != 0:
|
||||||
optional_params["frequency_penalty"] = frequency_penalty
|
optional_params["frequency_penalty"] = frequency_penalty
|
||||||
elif (
|
elif (model == "chat-bison"
|
||||||
model == "chat-bison"
|
): # chat-bison has diff args from chat-bison@001 ty Google
|
||||||
): # chat-bison has diff args from chat-bison@001 ty Google
|
|
||||||
if temperature != 1:
|
if temperature != 1:
|
||||||
optional_params["temperature"] = temperature
|
optional_params["temperature"] = temperature
|
||||||
if top_p != 1:
|
if top_p != 1:
|
||||||
|
@ -619,7 +651,10 @@ def load_test_model(
|
||||||
test_prompt = prompt
|
test_prompt = prompt
|
||||||
if num_calls:
|
if num_calls:
|
||||||
test_calls = num_calls
|
test_calls = num_calls
|
||||||
messages = [[{"role": "user", "content": test_prompt}] for _ in range(test_calls)]
|
messages = [[{
|
||||||
|
"role": "user",
|
||||||
|
"content": test_prompt
|
||||||
|
}] for _ in range(test_calls)]
|
||||||
start_time = time.time()
|
start_time = time.time()
|
||||||
try:
|
try:
|
||||||
litellm.batch_completion(
|
litellm.batch_completion(
|
||||||
|
@ -649,7 +684,7 @@ def load_test_model(
|
||||||
|
|
||||||
|
|
||||||
def set_callbacks(callback_list):
|
def set_callbacks(callback_list):
|
||||||
global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
|
global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger
|
||||||
try:
|
try:
|
||||||
for callback in callback_list:
|
for callback in callback_list:
|
||||||
print(f"callback: {callback}")
|
print(f"callback: {callback}")
|
||||||
|
@ -657,17 +692,15 @@ def set_callbacks(callback_list):
|
||||||
try:
|
try:
|
||||||
import sentry_sdk
|
import sentry_sdk
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print_verbose("Package 'sentry_sdk' is missing. Installing it...")
|
print_verbose(
|
||||||
|
"Package 'sentry_sdk' is missing. Installing it...")
|
||||||
subprocess.check_call(
|
subprocess.check_call(
|
||||||
[sys.executable, "-m", "pip", "install", "sentry_sdk"]
|
[sys.executable, "-m", "pip", "install", "sentry_sdk"])
|
||||||
)
|
|
||||||
import sentry_sdk
|
import sentry_sdk
|
||||||
sentry_sdk_instance = sentry_sdk
|
sentry_sdk_instance = sentry_sdk
|
||||||
sentry_trace_rate = (
|
sentry_trace_rate = (os.environ.get("SENTRY_API_TRACE_RATE")
|
||||||
os.environ.get("SENTRY_API_TRACE_RATE")
|
if "SENTRY_API_TRACE_RATE" in os.environ
|
||||||
if "SENTRY_API_TRACE_RATE" in os.environ
|
else "1.0")
|
||||||
else "1.0"
|
|
||||||
)
|
|
||||||
sentry_sdk_instance.init(
|
sentry_sdk_instance.init(
|
||||||
dsn=os.environ.get("SENTRY_API_URL"),
|
dsn=os.environ.get("SENTRY_API_URL"),
|
||||||
traces_sample_rate=float(sentry_trace_rate),
|
traces_sample_rate=float(sentry_trace_rate),
|
||||||
|
@ -678,10 +711,10 @@ def set_callbacks(callback_list):
|
||||||
try:
|
try:
|
||||||
from posthog import Posthog
|
from posthog import Posthog
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print_verbose("Package 'posthog' is missing. Installing it...")
|
print_verbose(
|
||||||
|
"Package 'posthog' is missing. Installing it...")
|
||||||
subprocess.check_call(
|
subprocess.check_call(
|
||||||
[sys.executable, "-m", "pip", "install", "posthog"]
|
[sys.executable, "-m", "pip", "install", "posthog"])
|
||||||
)
|
|
||||||
from posthog import Posthog
|
from posthog import Posthog
|
||||||
posthog = Posthog(
|
posthog = Posthog(
|
||||||
project_api_key=os.environ.get("POSTHOG_API_KEY"),
|
project_api_key=os.environ.get("POSTHOG_API_KEY"),
|
||||||
|
@ -691,10 +724,10 @@ def set_callbacks(callback_list):
|
||||||
try:
|
try:
|
||||||
from slack_bolt import App
|
from slack_bolt import App
|
||||||
except ImportError:
|
except ImportError:
|
||||||
print_verbose("Package 'slack_bolt' is missing. Installing it...")
|
print_verbose(
|
||||||
|
"Package 'slack_bolt' is missing. Installing it...")
|
||||||
subprocess.check_call(
|
subprocess.check_call(
|
||||||
[sys.executable, "-m", "pip", "install", "slack_bolt"]
|
[sys.executable, "-m", "pip", "install", "slack_bolt"])
|
||||||
)
|
|
||||||
from slack_bolt import App
|
from slack_bolt import App
|
||||||
slack_app = App(
|
slack_app = App(
|
||||||
token=os.environ.get("SLACK_API_TOKEN"),
|
token=os.environ.get("SLACK_API_TOKEN"),
|
||||||
|
@ -704,6 +737,8 @@ def set_callbacks(callback_list):
|
||||||
print_verbose(f"Initialized Slack App: {slack_app}")
|
print_verbose(f"Initialized Slack App: {slack_app}")
|
||||||
elif callback == "helicone":
|
elif callback == "helicone":
|
||||||
heliconeLogger = HeliconeLogger()
|
heliconeLogger = HeliconeLogger()
|
||||||
|
elif callback == "llmonitor":
|
||||||
|
llmonitorLogger = LLMonitorLogger()
|
||||||
elif callback == "aispend":
|
elif callback == "aispend":
|
||||||
aispendLogger = AISpendLogger()
|
aispendLogger = AISpendLogger()
|
||||||
elif callback == "berrispend":
|
elif callback == "berrispend":
|
||||||
|
@ -718,7 +753,8 @@ def set_callbacks(callback_list):
|
||||||
raise e
|
raise e
|
||||||
|
|
||||||
|
|
||||||
def handle_failure(exception, traceback_exception, start_time, end_time, args, kwargs):
|
def handle_failure(exception, traceback_exception, start_time, end_time, args,
|
||||||
|
kwargs):
|
||||||
global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
|
global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
|
||||||
try:
|
try:
|
||||||
# print_verbose(f"handle_failure args: {args}")
|
# print_verbose(f"handle_failure args: {args}")
|
||||||
|
@ -728,8 +764,7 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
failure_handler = additional_details.pop("failure_handler", None)
|
failure_handler = additional_details.pop("failure_handler", None)
|
||||||
|
|
||||||
additional_details["Event_Name"] = additional_details.pop(
|
additional_details["Event_Name"] = additional_details.pop(
|
||||||
"failed_event_name", "litellm.failed_query"
|
"failed_event_name", "litellm.failed_query")
|
||||||
)
|
|
||||||
print_verbose(f"self.failure_callback: {litellm.failure_callback}")
|
print_verbose(f"self.failure_callback: {litellm.failure_callback}")
|
||||||
|
|
||||||
# print_verbose(f"additional_details: {additional_details}")
|
# print_verbose(f"additional_details: {additional_details}")
|
||||||
|
@ -746,9 +781,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
for detail in additional_details:
|
for detail in additional_details:
|
||||||
slack_msg += f"{detail}: {additional_details[detail]}\n"
|
slack_msg += f"{detail}: {additional_details[detail]}\n"
|
||||||
slack_msg += f"Traceback: {traceback_exception}"
|
slack_msg += f"Traceback: {traceback_exception}"
|
||||||
slack_app.client.chat_postMessage(
|
slack_app.client.chat_postMessage(channel=alerts_channel,
|
||||||
channel=alerts_channel, text=slack_msg
|
text=slack_msg)
|
||||||
)
|
|
||||||
elif callback == "sentry":
|
elif callback == "sentry":
|
||||||
capture_exception(exception)
|
capture_exception(exception)
|
||||||
elif callback == "posthog":
|
elif callback == "posthog":
|
||||||
|
@ -767,9 +801,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
print_verbose(f"ph_obj: {ph_obj}")
|
print_verbose(f"ph_obj: {ph_obj}")
|
||||||
print_verbose(f"PostHog Event Name: {event_name}")
|
print_verbose(f"PostHog Event Name: {event_name}")
|
||||||
if "user_id" in additional_details:
|
if "user_id" in additional_details:
|
||||||
posthog.capture(
|
posthog.capture(additional_details["user_id"],
|
||||||
additional_details["user_id"], event_name, ph_obj
|
event_name, ph_obj)
|
||||||
)
|
|
||||||
else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
|
else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
|
||||||
unique_id = str(uuid.uuid4())
|
unique_id = str(uuid.uuid4())
|
||||||
posthog.capture(unique_id, event_name)
|
posthog.capture(unique_id, event_name)
|
||||||
|
@ -783,10 +816,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
"created": time.time(),
|
"created": time.time(),
|
||||||
"error": traceback_exception,
|
"error": traceback_exception,
|
||||||
"usage": {
|
"usage": {
|
||||||
"prompt_tokens": prompt_token_calculator(
|
"prompt_tokens":
|
||||||
model, messages=messages
|
prompt_token_calculator(model, messages=messages),
|
||||||
),
|
"completion_tokens":
|
||||||
"completion_tokens": 0,
|
0,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
berrispendLogger.log_event(
|
berrispendLogger.log_event(
|
||||||
|
@ -805,10 +838,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
"model": model,
|
"model": model,
|
||||||
"created": time.time(),
|
"created": time.time(),
|
||||||
"usage": {
|
"usage": {
|
||||||
"prompt_tokens": prompt_token_calculator(
|
"prompt_tokens":
|
||||||
model, messages=messages
|
prompt_token_calculator(model, messages=messages),
|
||||||
),
|
"completion_tokens":
|
||||||
"completion_tokens": 0,
|
0,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
aispendLogger.log_event(
|
aispendLogger.log_event(
|
||||||
|
@ -818,6 +851,27 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
end_time=end_time,
|
end_time=end_time,
|
||||||
print_verbose=print_verbose,
|
print_verbose=print_verbose,
|
||||||
)
|
)
|
||||||
|
elif callback == "llmonitor":
|
||||||
|
print_verbose("reaches llmonitor for logging!")
|
||||||
|
model = args[0] if len(args) > 0 else kwargs["model"]
|
||||||
|
messages = args[1] if len(args) > 1 else kwargs["messages"]
|
||||||
|
usage = {
|
||||||
|
"prompt_tokens":
|
||||||
|
prompt_token_calculator(model, messages=messages),
|
||||||
|
"completion_tokens":
|
||||||
|
0,
|
||||||
|
}
|
||||||
|
llmonitorLogger.log_event(
|
||||||
|
type="error",
|
||||||
|
user_id=litellm._thread_context.user,
|
||||||
|
model=model,
|
||||||
|
error=traceback_exception,
|
||||||
|
response_obj=result,
|
||||||
|
run_id=kwargs["litellm_call_id"],
|
||||||
|
timestamp=end_time,
|
||||||
|
usage=usage,
|
||||||
|
print_verbose=print_verbose,
|
||||||
|
)
|
||||||
elif callback == "supabase":
|
elif callback == "supabase":
|
||||||
print_verbose("reaches supabase for logging!")
|
print_verbose("reaches supabase for logging!")
|
||||||
print_verbose(f"supabaseClient: {supabaseClient}")
|
print_verbose(f"supabaseClient: {supabaseClient}")
|
||||||
|
@ -828,10 +882,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
"created": time.time(),
|
"created": time.time(),
|
||||||
"error": traceback_exception,
|
"error": traceback_exception,
|
||||||
"usage": {
|
"usage": {
|
||||||
"prompt_tokens": prompt_token_calculator(
|
"prompt_tokens":
|
||||||
model, messages=messages
|
prompt_token_calculator(model, messages=messages),
|
||||||
),
|
"completion_tokens":
|
||||||
"completion_tokens": 0,
|
0,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
supabaseClient.log_event(
|
supabaseClient.log_event(
|
||||||
|
@ -854,10 +908,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
"created": time.time(),
|
"created": time.time(),
|
||||||
"error": traceback_exception,
|
"error": traceback_exception,
|
||||||
"usage": {
|
"usage": {
|
||||||
"prompt_tokens": prompt_token_calculator(
|
"prompt_tokens":
|
||||||
model, messages=messages
|
prompt_token_calculator(model, messages=messages),
|
||||||
),
|
"completion_tokens":
|
||||||
"completion_tokens": 0,
|
0,
|
||||||
},
|
},
|
||||||
}
|
}
|
||||||
liteDebuggerClient.log_event(
|
liteDebuggerClient.log_event(
|
||||||
|
@ -884,19 +938,18 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
|
||||||
failure_handler(call_details)
|
failure_handler(call_details)
|
||||||
pass
|
pass
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
## LOGGING
|
# LOGGING
|
||||||
exception_logging(logger_fn=user_logger_fn, exception=e)
|
exception_logging(logger_fn=user_logger_fn, exception=e)
|
||||||
pass
|
pass
|
||||||
|
|
||||||
|
|
||||||
def handle_success(args, kwargs, result, start_time, end_time):
|
def handle_success(args, kwargs, result, start_time, end_time):
|
||||||
global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient
|
global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger
|
||||||
try:
|
try:
|
||||||
success_handler = additional_details.pop("success_handler", None)
|
success_handler = additional_details.pop("success_handler", None)
|
||||||
failure_handler = additional_details.pop("failure_handler", None)
|
failure_handler = additional_details.pop("failure_handler", None)
|
||||||
additional_details["Event_Name"] = additional_details.pop(
|
additional_details["Event_Name"] = additional_details.pop(
|
||||||
"successful_event_name", "litellm.succes_query"
|
"successful_event_name", "litellm.succes_query")
|
||||||
)
|
|
||||||
for callback in litellm.success_callback:
|
for callback in litellm.success_callback:
|
||||||
try:
|
try:
|
||||||
if callback == "posthog":
|
if callback == "posthog":
|
||||||
|
@ -905,9 +958,8 @@ def handle_success(args, kwargs, result, start_time, end_time):
|
||||||
ph_obj[detail] = additional_details[detail]
|
ph_obj[detail] = additional_details[detail]
|
||||||
event_name = additional_details["Event_Name"]
|
event_name = additional_details["Event_Name"]
|
||||||
if "user_id" in additional_details:
|
if "user_id" in additional_details:
|
||||||
posthog.capture(
|
posthog.capture(additional_details["user_id"],
|
||||||
additional_details["user_id"], event_name, ph_obj
|
event_name, ph_obj)
|
||||||
)
|
|
||||||
else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
|
else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
|
||||||
unique_id = str(uuid.uuid4())
|
unique_id = str(uuid.uuid4())
|
||||||
posthog.capture(unique_id, event_name, ph_obj)
|
posthog.capture(unique_id, event_name, ph_obj)
|
||||||
|
@ -916,9 +968,8 @@ def handle_success(args, kwargs, result, start_time, end_time):
|
||||||
slack_msg = ""
|
slack_msg = ""
|
||||||
for detail in additional_details:
|
for detail in additional_details:
|
||||||
slack_msg += f"{detail}: {additional_details[detail]}\n"
|
slack_msg += f"{detail}: {additional_details[detail]}\n"
|
||||||
slack_app.client.chat_postMessage(
|
slack_app.client.chat_postMessage(channel=alerts_channel,
|
||||||
channel=alerts_channel, text=slack_msg
|
text=slack_msg)
|
||||||
)
|
|
||||||
elif callback == "helicone":
|
elif callback == "helicone":
|
||||||
print_verbose("reaches helicone for logging!")
|
print_verbose("reaches helicone for logging!")
|
||||||
model = args[0] if len(args) > 0 else kwargs["model"]
|
model = args[0] if len(args) > 0 else kwargs["model"]
|
||||||
|
@ -931,6 +982,22 @@ def handle_success(args, kwargs, result, start_time, end_time):
|
||||||
end_time=end_time,
|
end_time=end_time,
|
||||||
print_verbose=print_verbose,
|
print_verbose=print_verbose,
|
||||||
)
|
)
|
||||||
|
elif callback == "llmonitor":
|
||||||
|
print_verbose("reaches llmonitor for logging!")
|
||||||
|
model = args[0] if len(args) > 0 else kwargs["model"]
|
||||||
|
messages = args[1] if len(args) > 1 else kwargs["messages"]
|
||||||
|
usage = kwargs["usage"]
|
||||||
|
llmonitorLogger.log_event(
|
||||||
|
type="end",
|
||||||
|
model=model,
|
||||||
|
messages=messages,
|
||||||
|
user_id=litellm._thread_context.user,
|
||||||
|
response_obj=result,
|
||||||
|
time=end_time,
|
||||||
|
usage=usage,
|
||||||
|
run_id=kwargs["litellm_call_id"],
|
||||||
|
print_verbose=print_verbose,
|
||||||
|
)
|
||||||
elif callback == "aispend":
|
elif callback == "aispend":
|
||||||
print_verbose("reaches aispend for logging!")
|
print_verbose("reaches aispend for logging!")
|
||||||
model = args[0] if len(args) > 0 else kwargs["model"]
|
model = args[0] if len(args) > 0 else kwargs["model"]
|
||||||
|
@ -984,7 +1051,7 @@ def handle_success(args, kwargs, result, start_time, end_time):
|
||||||
print_verbose=print_verbose,
|
print_verbose=print_verbose,
|
||||||
)
|
)
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
## LOGGING
|
# LOGGING
|
||||||
exception_logging(logger_fn=user_logger_fn, exception=e)
|
exception_logging(logger_fn=user_logger_fn, exception=e)
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
|
f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
|
||||||
|
@ -995,7 +1062,7 @@ def handle_success(args, kwargs, result, start_time, end_time):
|
||||||
success_handler(args, kwargs)
|
success_handler(args, kwargs)
|
||||||
pass
|
pass
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
## LOGGING
|
# LOGGING
|
||||||
exception_logging(logger_fn=user_logger_fn, exception=e)
|
exception_logging(logger_fn=user_logger_fn, exception=e)
|
||||||
print_verbose(
|
print_verbose(
|
||||||
f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
|
f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
|
||||||
|
@ -1046,33 +1113,36 @@ def exception_type(model, original_exception, custom_llm_provider):
|
||||||
exception_type = ""
|
exception_type = ""
|
||||||
if "claude" in model: # one of the anthropics
|
if "claude" in model: # one of the anthropics
|
||||||
if hasattr(original_exception, "status_code"):
|
if hasattr(original_exception, "status_code"):
|
||||||
print_verbose(f"status_code: {original_exception.status_code}")
|
print_verbose(
|
||||||
|
f"status_code: {original_exception.status_code}")
|
||||||
if original_exception.status_code == 401:
|
if original_exception.status_code == 401:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise AuthenticationError(
|
raise AuthenticationError(
|
||||||
message=f"AnthropicException - {original_exception.message}",
|
message=
|
||||||
|
f"AnthropicException - {original_exception.message}",
|
||||||
llm_provider="anthropic",
|
llm_provider="anthropic",
|
||||||
)
|
)
|
||||||
elif original_exception.status_code == 400:
|
elif original_exception.status_code == 400:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise InvalidRequestError(
|
raise InvalidRequestError(
|
||||||
message=f"AnthropicException - {original_exception.message}",
|
message=
|
||||||
|
f"AnthropicException - {original_exception.message}",
|
||||||
model=model,
|
model=model,
|
||||||
llm_provider="anthropic",
|
llm_provider="anthropic",
|
||||||
)
|
)
|
||||||
elif original_exception.status_code == 429:
|
elif original_exception.status_code == 429:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise RateLimitError(
|
raise RateLimitError(
|
||||||
message=f"AnthropicException - {original_exception.message}",
|
message=
|
||||||
|
f"AnthropicException - {original_exception.message}",
|
||||||
llm_provider="anthropic",
|
llm_provider="anthropic",
|
||||||
)
|
)
|
||||||
elif (
|
elif ("Could not resolve authentication method. Expected either api_key or auth_token to be set."
|
||||||
"Could not resolve authentication method. Expected either api_key or auth_token to be set."
|
in error_str):
|
||||||
in error_str
|
|
||||||
):
|
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise AuthenticationError(
|
raise AuthenticationError(
|
||||||
message=f"AnthropicException - {original_exception.message}",
|
message=
|
||||||
|
f"AnthropicException - {original_exception.message}",
|
||||||
llm_provider="anthropic",
|
llm_provider="anthropic",
|
||||||
)
|
)
|
||||||
elif "replicate" in model:
|
elif "replicate" in model:
|
||||||
|
@ -1096,35 +1166,36 @@ def exception_type(model, original_exception, custom_llm_provider):
|
||||||
llm_provider="replicate",
|
llm_provider="replicate",
|
||||||
)
|
)
|
||||||
elif (
|
elif (
|
||||||
exception_type == "ReplicateError"
|
exception_type == "ReplicateError"
|
||||||
): ## ReplicateError implies an error on Replicate server side, not user side
|
): # ReplicateError implies an error on Replicate server side, not user side
|
||||||
raise ServiceUnavailableError(
|
raise ServiceUnavailableError(
|
||||||
message=f"ReplicateException - {error_str}",
|
message=f"ReplicateException - {error_str}",
|
||||||
llm_provider="replicate",
|
llm_provider="replicate",
|
||||||
)
|
)
|
||||||
elif model == "command-nightly": # Cohere
|
elif model == "command-nightly": # Cohere
|
||||||
if (
|
if ("invalid api token" in error_str
|
||||||
"invalid api token" in error_str
|
or "No API key provided." in error_str):
|
||||||
or "No API key provided." in error_str
|
|
||||||
):
|
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise AuthenticationError(
|
raise AuthenticationError(
|
||||||
message=f"CohereException - {original_exception.message}",
|
message=
|
||||||
|
f"CohereException - {original_exception.message}",
|
||||||
llm_provider="cohere",
|
llm_provider="cohere",
|
||||||
)
|
)
|
||||||
elif "too many tokens" in error_str:
|
elif "too many tokens" in error_str:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise InvalidRequestError(
|
raise InvalidRequestError(
|
||||||
message=f"CohereException - {original_exception.message}",
|
message=
|
||||||
|
f"CohereException - {original_exception.message}",
|
||||||
model=model,
|
model=model,
|
||||||
llm_provider="cohere",
|
llm_provider="cohere",
|
||||||
)
|
)
|
||||||
elif (
|
elif (
|
||||||
"CohereConnectionError" in exception_type
|
"CohereConnectionError" in exception_type
|
||||||
): # cohere seems to fire these errors when we load test it (1k+ messages / min)
|
): # cohere seems to fire these errors when we load test it (1k+ messages / min)
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise RateLimitError(
|
raise RateLimitError(
|
||||||
message=f"CohereException - {original_exception.message}",
|
message=
|
||||||
|
f"CohereException - {original_exception.message}",
|
||||||
llm_provider="cohere",
|
llm_provider="cohere",
|
||||||
)
|
)
|
||||||
elif custom_llm_provider == "huggingface":
|
elif custom_llm_provider == "huggingface":
|
||||||
|
@ -1132,27 +1203,30 @@ def exception_type(model, original_exception, custom_llm_provider):
|
||||||
if original_exception.status_code == 401:
|
if original_exception.status_code == 401:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise AuthenticationError(
|
raise AuthenticationError(
|
||||||
message=f"HuggingfaceException - {original_exception.message}",
|
message=
|
||||||
|
f"HuggingfaceException - {original_exception.message}",
|
||||||
llm_provider="huggingface",
|
llm_provider="huggingface",
|
||||||
)
|
)
|
||||||
elif original_exception.status_code == 400:
|
elif original_exception.status_code == 400:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise InvalidRequestError(
|
raise InvalidRequestError(
|
||||||
message=f"HuggingfaceException - {original_exception.message}",
|
message=
|
||||||
|
f"HuggingfaceException - {original_exception.message}",
|
||||||
model=model,
|
model=model,
|
||||||
llm_provider="huggingface",
|
llm_provider="huggingface",
|
||||||
)
|
)
|
||||||
elif original_exception.status_code == 429:
|
elif original_exception.status_code == 429:
|
||||||
exception_mapping_worked = True
|
exception_mapping_worked = True
|
||||||
raise RateLimitError(
|
raise RateLimitError(
|
||||||
message=f"HuggingfaceException - {original_exception.message}",
|
message=
|
||||||
|
f"HuggingfaceException - {original_exception.message}",
|
||||||
llm_provider="huggingface",
|
llm_provider="huggingface",
|
||||||
)
|
)
|
||||||
raise original_exception # base case - return the original exception
|
raise original_exception # base case - return the original exception
|
||||||
else:
|
else:
|
||||||
raise original_exception
|
raise original_exception
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
## LOGGING
|
# LOGGING
|
||||||
exception_logging(
|
exception_logging(
|
||||||
logger_fn=user_logger_fn,
|
logger_fn=user_logger_fn,
|
||||||
additional_args={
|
additional_args={
|
||||||
|
@ -1173,7 +1247,7 @@ def safe_crash_reporting(model=None, exception=None, custom_llm_provider=None):
|
||||||
"exception": str(exception),
|
"exception": str(exception),
|
||||||
"custom_llm_provider": custom_llm_provider,
|
"custom_llm_provider": custom_llm_provider,
|
||||||
}
|
}
|
||||||
threading.Thread(target=litellm_telemetry, args=(data,)).start()
|
threading.Thread(target=litellm_telemetry, args=(data, )).start()
|
||||||
|
|
||||||
|
|
||||||
def litellm_telemetry(data):
|
def litellm_telemetry(data):
|
||||||
|
@ -1223,11 +1297,13 @@ def get_secret(secret_name):
|
||||||
if litellm.secret_manager_client != None:
|
if litellm.secret_manager_client != None:
|
||||||
# TODO: check which secret manager is being used
|
# TODO: check which secret manager is being used
|
||||||
# currently only supports Infisical
|
# currently only supports Infisical
|
||||||
secret = litellm.secret_manager_client.get_secret(secret_name).secret_value
|
secret = litellm.secret_manager_client.get_secret(
|
||||||
|
secret_name).secret_value
|
||||||
if secret != None:
|
if secret != None:
|
||||||
return secret # if secret found in secret manager return it
|
return secret # if secret found in secret manager return it
|
||||||
else:
|
else:
|
||||||
raise ValueError(f"Secret '{secret_name}' not found in secret manager")
|
raise ValueError(
|
||||||
|
f"Secret '{secret_name}' not found in secret manager")
|
||||||
elif litellm.api_key != None: # if users use litellm default key
|
elif litellm.api_key != None: # if users use litellm default key
|
||||||
return litellm.api_key
|
return litellm.api_key
|
||||||
else:
|
else:
|
||||||
|
@ -1238,6 +1314,7 @@ def get_secret(secret_name):
|
||||||
# wraps the completion stream to return the correct format for the model
|
# wraps the completion stream to return the correct format for the model
|
||||||
# replicate/anthropic/cohere
|
# replicate/anthropic/cohere
|
||||||
class CustomStreamWrapper:
|
class CustomStreamWrapper:
|
||||||
|
|
||||||
def __init__(self, completion_stream, model, custom_llm_provider=None):
|
def __init__(self, completion_stream, model, custom_llm_provider=None):
|
||||||
self.model = model
|
self.model = model
|
||||||
self.custom_llm_provider = custom_llm_provider
|
self.custom_llm_provider = custom_llm_provider
|
||||||
|
@ -1288,7 +1365,8 @@ class CustomStreamWrapper:
|
||||||
elif self.model == "replicate":
|
elif self.model == "replicate":
|
||||||
chunk = next(self.completion_stream)
|
chunk = next(self.completion_stream)
|
||||||
completion_obj["content"] = chunk
|
completion_obj["content"] = chunk
|
||||||
elif (self.model == "together_ai") or ("togethercomputer" in self.model):
|
elif (self.model == "together_ai") or ("togethercomputer"
|
||||||
|
in self.model):
|
||||||
chunk = next(self.completion_stream)
|
chunk = next(self.completion_stream)
|
||||||
text_data = self.handle_together_ai_chunk(chunk)
|
text_data = self.handle_together_ai_chunk(chunk)
|
||||||
if text_data == "":
|
if text_data == "":
|
||||||
|
@ -1321,12 +1399,11 @@ def read_config_args(config_path):
|
||||||
|
|
||||||
|
|
||||||
########## ollama implementation ############################
|
########## ollama implementation ############################
|
||||||
import aiohttp
|
|
||||||
|
|
||||||
|
|
||||||
async def get_ollama_response_stream(
|
async def get_ollama_response_stream(api_base="http://localhost:11434",
|
||||||
api_base="http://localhost:11434", model="llama2", prompt="Why is the sky blue?"
|
model="llama2",
|
||||||
):
|
prompt="Why is the sky blue?"):
|
||||||
session = aiohttp.ClientSession()
|
session = aiohttp.ClientSession()
|
||||||
url = f"{api_base}/api/generate"
|
url = f"{api_base}/api/generate"
|
||||||
data = {
|
data = {
|
||||||
|
@ -1349,7 +1426,11 @@ async def get_ollama_response_stream(
|
||||||
"content": "",
|
"content": "",
|
||||||
}
|
}
|
||||||
completion_obj["content"] = j["response"]
|
completion_obj["content"] = j["response"]
|
||||||
yield {"choices": [{"delta": completion_obj}]}
|
yield {
|
||||||
|
"choices": [{
|
||||||
|
"delta": completion_obj
|
||||||
|
}]
|
||||||
|
}
|
||||||
# self.responses.append(j["response"])
|
# self.responses.append(j["response"])
|
||||||
# yield "blank"
|
# yield "blank"
|
||||||
except Exception as e:
|
except Exception as e:
|
||||||
|
|
|
@ -1,6 +1,7 @@
|
||||||
|
|
||||||
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
|
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
|
||||||
|
|
||||||
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
|
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
|
||||||
|
|
||||||
[](https://pypi.org/project/litellm/)
|
[](https://pypi.org/project/litellm/)
|
||||||
[](https://pypi.org/project/litellm/0.1.1/)
|
[](https://pypi.org/project/litellm/0.1.1/)
|
||||||

|

|
||||||
|
@ -11,34 +12,36 @@
|
||||||

|

|
||||||
|
|
||||||
## What does liteLLM proxy do
|
## What does liteLLM proxy do
|
||||||
|
|
||||||
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
|
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
|
||||||
|
|
||||||
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
|
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"content": "Hello, whats the weather in San Francisco??",
|
"content": "Hello, whats the weather in San Francisco??",
|
||||||
"role": "user"
|
"role": "user"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
- **Consistent Input/Output** Format
|
|
||||||
- Call all models using the OpenAI format - `completion(model, messages)`
|
|
||||||
- Text responses will always be available at `['choices'][0]['message']['content']`
|
|
||||||
- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
|
|
||||||
- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
|
|
||||||
|
|
||||||
**Example: Logs sent to Supabase**
|
- **Consistent Input/Output** Format
|
||||||
|
- Call all models using the OpenAI format - `completion(model, messages)`
|
||||||
|
- Text responses will always be available at `['choices'][0]['message']['content']`
|
||||||
|
- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
|
||||||
|
- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone`, `LLMonitor` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
|
||||||
|
|
||||||
|
**Example: Logs sent to Supabase**
|
||||||
<img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
|
<img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
|
||||||
|
|
||||||
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
|
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
|
||||||
- **Caching** - Implementation of Semantic Caching
|
- **Caching** - Implementation of Semantic Caching
|
||||||
- **Streaming & Async Support** - Return generators to stream text responses
|
- **Streaming & Async Support** - Return generators to stream text responses
|
||||||
|
|
||||||
|
|
||||||
## API Endpoints
|
## API Endpoints
|
||||||
|
|
||||||
### `/chat/completions` (POST)
|
### `/chat/completions` (POST)
|
||||||
|
@ -46,34 +49,37 @@
|
||||||
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
|
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
|
||||||
|
|
||||||
#### Input
|
#### Input
|
||||||
|
|
||||||
This API endpoint accepts all inputs in raw JSON and expects the following inputs
|
This API endpoint accepts all inputs in raw JSON and expects the following inputs
|
||||||
- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
|
|
||||||
eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
|
||||||
|
eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
|
||||||
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
|
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
|
||||||
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
|
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
|
||||||
|
|
||||||
|
|
||||||
#### Example JSON body
|
#### Example JSON body
|
||||||
|
|
||||||
For claude-2
|
For claude-2
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"model": "claude-2",
|
"model": "claude-2",
|
||||||
"messages": [
|
"messages": [
|
||||||
{
|
{
|
||||||
"content": "Hello, whats the weather in San Francisco??",
|
"content": "Hello, whats the weather in San Francisco??",
|
||||||
"role": "user"
|
"role": "user"
|
||||||
}
|
}
|
||||||
]
|
]
|
||||||
|
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
### Making an API request to the Proxy Server
|
### Making an API request to the Proxy Server
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import requests
|
import requests
|
||||||
import json
|
import json
|
||||||
|
|
||||||
# TODO: use your URL
|
# TODO: use your URL
|
||||||
url = "http://localhost:5000/chat/completions"
|
url = "http://localhost:5000/chat/completions"
|
||||||
|
|
||||||
payload = json.dumps({
|
payload = json.dumps({
|
||||||
|
@ -94,34 +100,38 @@ print(response.text)
|
||||||
```
|
```
|
||||||
|
|
||||||
### Output [Response Format]
|
### Output [Response Format]
|
||||||
Responses from the server are given in the following format.
|
|
||||||
|
Responses from the server are given in the following format.
|
||||||
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
|
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"choices": [
|
"choices": [
|
||||||
{
|
{
|
||||||
"finish_reason": "stop",
|
"finish_reason": "stop",
|
||||||
"index": 0,
|
"index": 0,
|
||||||
"message": {
|
"message": {
|
||||||
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
|
"content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
|
||||||
"role": "assistant"
|
"role": "assistant"
|
||||||
}
|
}
|
||||||
}
|
|
||||||
],
|
|
||||||
"created": 1691790381,
|
|
||||||
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
|
|
||||||
"model": "gpt-3.5-turbo-0613",
|
|
||||||
"object": "chat.completion",
|
|
||||||
"usage": {
|
|
||||||
"completion_tokens": 41,
|
|
||||||
"prompt_tokens": 16,
|
|
||||||
"total_tokens": 57
|
|
||||||
}
|
}
|
||||||
|
],
|
||||||
|
"created": 1691790381,
|
||||||
|
"id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
|
||||||
|
"model": "gpt-3.5-turbo-0613",
|
||||||
|
"object": "chat.completion",
|
||||||
|
"usage": {
|
||||||
|
"completion_tokens": 41,
|
||||||
|
"prompt_tokens": 16,
|
||||||
|
"total_tokens": 57
|
||||||
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Installation & Usage
|
## Installation & Usage
|
||||||
|
|
||||||
### Running Locally
|
### Running Locally
|
||||||
|
|
||||||
1. Clone liteLLM repository to your local machine:
|
1. Clone liteLLM repository to your local machine:
|
||||||
```
|
```
|
||||||
git clone https://github.com/BerriAI/liteLLM-proxy
|
git clone https://github.com/BerriAI/liteLLM-proxy
|
||||||
|
@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM
|
||||||
python main.py
|
python main.py
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
## Deploying
|
## Deploying
|
||||||
|
|
||||||
1. Quick Start: Deploy on Railway
|
1. Quick Start: Deploy on Railway
|
||||||
|
|
||||||
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
|
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
|
||||||
|
|
||||||
2. `GCP`, `AWS`, `Azure`
|
2. `GCP`, `AWS`, `Azure`
|
||||||
This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
|
This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
|
||||||
|
|
||||||
# Support / Talk with founders
|
# Support / Talk with founders
|
||||||
|
|
||||||
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
|
||||||
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
|
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
|
||||||
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
|
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
|
||||||
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
|
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
|
||||||
|
|
||||||
|
|
||||||
## Roadmap
|
## Roadmap
|
||||||
|
|
||||||
- [ ] Support hosted db (e.g. Supabase)
|
- [ ] Support hosted db (e.g. Supabase)
|
||||||
- [ ] Easily send data to places like posthog and sentry.
|
- [ ] Easily send data to places like posthog and sentry.
|
||||||
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
|
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue