diff --git a/cookbook/proxy-server/readme.md b/cookbook/proxy-server/readme.md
index 4f735f38c..bb9e00804 100644
--- a/cookbook/proxy-server/readme.md
+++ b/cookbook/proxy-server/readme.md
@@ -1,6 +1,7 @@
-
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
+
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
+
[](https://pypi.org/project/litellm/)
[](https://pypi.org/project/litellm/0.1.1/)

@@ -11,34 +12,36 @@

## What does liteLLM proxy do
+
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
-
+
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
```json
{
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
"messages": [
- {
- "content": "Hello, whats the weather in San Francisco??",
- "role": "user"
- }
- ]
+ {
+ "content": "Hello, whats the weather in San Francisco??",
+ "role": "user"
+ }
+ ]
}
```
-- **Consistent Input/Output** Format
- - Call all models using the OpenAI format - `completion(model, messages)`
- - Text responses will always be available at `['choices'][0]['message']['content']`
-- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
-- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
- **Example: Logs sent to Supabase**
+- **Consistent Input/Output** Format
+ - Call all models using the OpenAI format - `completion(model, messages)`
+ - Text responses will always be available at `['choices'][0]['message']['content']`
+- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
+- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `LLMonitor,` `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
+
+ **Example: Logs sent to Supabase**
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
- **Caching** - Implementation of Semantic Caching
- **Streaming & Async Support** - Return generators to stream text responses
-
## API Endpoints
### `/chat/completions` (POST)
@@ -46,34 +49,37 @@
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
#### Input
+
This API endpoint accepts all inputs in raw JSON and expects the following inputs
-- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
- eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
+- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
+ eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
-
#### Example JSON body
+
For claude-2
+
```json
{
- "model": "claude-2",
- "messages": [
- {
- "content": "Hello, whats the weather in San Francisco??",
- "role": "user"
- }
- ]
-
+ "model": "claude-2",
+ "messages": [
+ {
+ "content": "Hello, whats the weather in San Francisco??",
+ "role": "user"
+ }
+ ]
}
```
### Making an API request to the Proxy Server
+
```python
import requests
import json
-# TODO: use your URL
+# TODO: use your URL
url = "http://localhost:5000/chat/completions"
payload = json.dumps({
@@ -94,34 +100,38 @@ print(response.text)
```
### Output [Response Format]
-Responses from the server are given in the following format.
+
+Responses from the server are given in the following format.
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
+
```json
{
- "choices": [
- {
- "finish_reason": "stop",
- "index": 0,
- "message": {
- "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
- "role": "assistant"
- }
- }
- ],
- "created": 1691790381,
- "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
- "model": "gpt-3.5-turbo-0613",
- "object": "chat.completion",
- "usage": {
- "completion_tokens": 41,
- "prompt_tokens": 16,
- "total_tokens": 57
+ "choices": [
+ {
+ "finish_reason": "stop",
+ "index": 0,
+ "message": {
+ "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
+ "role": "assistant"
+ }
}
+ ],
+ "created": 1691790381,
+ "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
+ "model": "gpt-3.5-turbo-0613",
+ "object": "chat.completion",
+ "usage": {
+ "completion_tokens": 41,
+ "prompt_tokens": 16,
+ "total_tokens": 57
+ }
}
```
## Installation & Usage
+
### Running Locally
+
1. Clone liteLLM repository to your local machine:
```
git clone https://github.com/BerriAI/liteLLM-proxy
@@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM
python main.py
```
-
-
## Deploying
+
1. Quick Start: Deploy on Railway
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
-
-2. `GCP`, `AWS`, `Azure`
-This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
+
+2. `GCP`, `AWS`, `Azure`
+ This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
# Support / Talk with founders
+
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
-
## Roadmap
+
- [ ] Support hosted db (e.g. Supabase)
- [ ] Easily send data to places like posthog and sentry.
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
diff --git a/litellm/integrations/llmonitor.py b/litellm/integrations/llmonitor.py
index e7430b5bb..b2940e872 100644
--- a/litellm/integrations/llmonitor.py
+++ b/litellm/integrations/llmonitor.py
@@ -5,6 +5,7 @@ import traceback
import dotenv
import os
import requests
+
dotenv.load_dotenv() # Loading env variables using dotenv
@@ -14,45 +15,34 @@ class LLMonitorLogger:
# Instance variables
self.api_url = os.getenv(
"LLMONITOR_API_URL") or "https://app.llmonitor.com"
- self.account_id = os.getenv("LLMONITOR_APP_ID")
+ self.app_id = os.getenv("LLMONITOR_APP_ID")
- def log_event(self, model, messages, response_obj, start_time, end_time, print_verbose):
+ def log_event(self, type, run_id, error, usage, model, messages,
+ response_obj, user_id, time, print_verbose):
# Method definition
try:
print_verbose(
- f"LLMonitor Logging - Enters logging function for model {model}")
+ f"LLMonitor Logging - Enters logging function for model {model}"
+ )
- print(model, messages, response_obj, start_time, end_time)
+ print(type, model, messages, response_obj, time, end_user)
- # headers = {
- # 'Content-Type': 'application/json'
- # }
+ headers = {'Content-Type': 'application/json'}
- # prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = self.price_calculator(
- # model, response_obj, start_time, end_time)
- # total_cost = prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
+ data = {
+ "type": "llm",
+ "name": model,
+ "runId": run_id,
+ "app": self.app_id,
+ "error": error,
+ "event": type,
+ "timestamp": time.isoformat(),
+ "userId": user_id,
+ "input": messages,
+ "output": response_obj['choices'][0]['message']['content'],
+ }
- # response_time = (end_time-start_time).total_seconds()
- # if "response" in response_obj:
- # data = [{
- # "response_time": response_time,
- # "model_id": response_obj["model"],
- # "total_cost": total_cost,
- # "messages": messages,
- # "response": response_obj['choices'][0]['message']['content'],
- # "account_id": self.account_id
- # }]
- # elif "error" in response_obj:
- # data = [{
- # "response_time": response_time,
- # "model_id": response_obj["model"],
- # "total_cost": total_cost,
- # "messages": messages,
- # "error": response_obj['error'],
- # "account_id": self.account_id
- # }]
-
- # print_verbose(f"BerriSpend Logging - final data object: {data}")
+ print_verbose(f"LLMonitor Logging - final data object: {data}")
# response = requests.post(url, headers=headers, json=data)
except:
# traceback.print_exc()
diff --git a/litellm/tests/test_llmonitor_integration.py b/litellm/tests/test_llmonitor_integration.py
index 5a4b4beb3..d2045e5dc 100644
--- a/litellm/tests/test_llmonitor_integration.py
+++ b/litellm/tests/test_llmonitor_integration.py
@@ -1,28 +1,36 @@
#### What this tests ####
-# This tests if logging to the helicone integration actually works
-
-from litellm import embedding, completion
-import litellm
+# This tests if logging to the llmonitor integration actually works
+# Adds the parent directory to the system path
import sys
import os
-import traceback
-import pytest
-# Adds the parent directory to the system path
sys.path.insert(0, os.path.abspath('../..'))
+from litellm import completion
+import litellm
+
+litellm.input_callback = ["llmonitor"]
litellm.success_callback = ["llmonitor"]
+litellm.error_callback = ["llmonitor"]
litellm.set_verbose = True
-user_message = "Hello, how are you?"
-messages = [{"content": user_message, "role": "user"}]
-
-
# openai call
-response = completion(model="gpt-3.5-turbo",
- messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
+# response = completion(model="gpt-3.5-turbo",
+# messages=[{
+# "role": "user",
+# "content": "Hi 👋 - i'm openai"
+# }])
+
+# print(response)
+
+# #bad request call
+# response = completion(model="chatgpt-test", messages=[{"role": "user", "content": "Hi 👋 - i'm a bad request"}])
# cohere call
-# response = completion(model="command-nightly",
-# messages=[{"role": "user", "content": "Hi 👋 - i'm cohere"}])
+response = completion(model="command-nightly",
+ messages=[{
+ "role": "user",
+ "content": "Hi 👋 - i'm cohere"
+ }])
+print(response)
diff --git a/litellm/utils.py b/litellm/utils.py
index e47b55978..c451095b8 100644
--- a/litellm/utils.py
+++ b/litellm/utils.py
@@ -1,20 +1,7 @@
-import sys
-import dotenv, json, traceback, threading
-import subprocess, os
-import litellm, openai
-import random, uuid, requests
-import datetime, time
-import tiktoken
-
-encoding = tiktoken.get_encoding("cl100k_base")
-import pkg_resources
-from .integrations.helicone import HeliconeLogger
-from .integrations.aispend import AISpendLogger
-from .integrations.berrispend import BerriSpendLogger
-from .integrations.supabase import Supabase
-from .integrations.litedebugger import LiteDebugger
-from openai.error import OpenAIError as OriginalError
-from openai.openai_object import OpenAIObject
+import aiohttp
+import subprocess
+import importlib
+from typing import List, Dict, Union, Optional
from .exceptions import (
AuthenticationError,
InvalidRequestError,
@@ -22,7 +9,32 @@ from .exceptions import (
ServiceUnavailableError,
OpenAIError,
)
-from typing import List, Dict, Union, Optional
+from openai.openai_object import OpenAIObject
+from openai.error import OpenAIError as OriginalError
+from .integrations.llmonitor import LLMonitorLogger
+from .integrations.litedebugger import LiteDebugger
+from .integrations.supabase import Supabase
+from .integrations.berrispend import BerriSpendLogger
+from .integrations.aispend import AISpendLogger
+from .integrations.helicone import HeliconeLogger
+import pkg_resources
+import sys
+import dotenv
+import json
+import traceback
+import threading
+import subprocess
+import os
+import litellm
+import openai
+import random
+import uuid
+import requests
+import datetime
+import time
+import tiktoken
+
+encoding = tiktoken.get_encoding("cl100k_base")
####### ENVIRONMENT VARIABLES ###################
dotenv.load_dotenv() # Loading env variables using dotenv
@@ -37,6 +49,7 @@ aispendLogger = None
berrispendLogger = None
supabaseClient = None
liteDebuggerClient = None
+llmonitorLogger = None
callback_list: Optional[List[str]] = []
user_logger_fn = None
additional_details: Optional[Dict[str, str]] = {}
@@ -63,6 +76,7 @@ local_cache: Optional[Dict[str, str]] = {}
class Message(OpenAIObject):
+
def __init__(self, content="default", role="assistant", **params):
super(Message, self).__init__(**params)
self.content = content
@@ -70,7 +84,12 @@ class Message(OpenAIObject):
class Choices(OpenAIObject):
- def __init__(self, finish_reason="stop", index=0, message=Message(), **params):
+
+ def __init__(self,
+ finish_reason="stop",
+ index=0,
+ message=Message(),
+ **params):
super(Choices, self).__init__(**params)
self.finish_reason = finish_reason
self.index = index
@@ -78,20 +97,22 @@ class Choices(OpenAIObject):
class ModelResponse(OpenAIObject):
- def __init__(self, choices=None, created=None, model=None, usage=None, **params):
+
+ def __init__(self,
+ choices=None,
+ created=None,
+ model=None,
+ usage=None,
+ **params):
super(ModelResponse, self).__init__(**params)
self.choices = choices if choices else [Choices()]
self.created = created
self.model = model
- self.usage = (
- usage
- if usage
- else {
- "prompt_tokens": None,
- "completion_tokens": None,
- "total_tokens": None,
- }
- )
+ self.usage = (usage if usage else {
+ "prompt_tokens": None,
+ "completion_tokens": None,
+ "total_tokens": None,
+ })
def to_dict_recursive(self):
d = super().to_dict_recursive()
@@ -108,8 +129,6 @@ def print_verbose(print_statement):
####### Package Import Handler ###################
-import importlib
-import subprocess
def install_and_import(package: str):
@@ -139,6 +158,7 @@ def install_and_import(package: str):
# Logging function -> log the exact model details + what's being sent | Non-Blocking
class Logging:
global supabaseClient, liteDebuggerClient
+
def __init__(self, model, messages, optional_params, litellm_params):
self.model = model
self.messages = messages
@@ -146,20 +166,20 @@ class Logging:
self.litellm_params = litellm_params
self.logger_fn = litellm_params["logger_fn"]
self.model_call_details = {
- "model": model,
- "messages": messages,
+ "model": model,
+ "messages": messages,
"optional_params": self.optional_params,
"litellm_params": self.litellm_params,
}
-
+
def pre_call(self, input, api_key, additional_args={}):
try:
print(f"logging pre call for model: {self.model}")
self.model_call_details["input"] = input
self.model_call_details["api_key"] = api_key
self.model_call_details["additional_args"] = additional_args
-
- ## User Logging -> if you pass in a custom logging function
+
+ # User Logging -> if you pass in a custom logging function
print_verbose(
f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
)
@@ -173,7 +193,7 @@ class Logging:
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
)
- ## Input Integration Logging -> If you want to log the fact that an attempt to call the model was made
+ # Input Integration Logging -> If you want to log the fact that an attempt to call the model was made
for callback in litellm.input_callback:
try:
if callback == "supabase":
@@ -185,7 +205,21 @@ class Logging:
model=model,
messages=messages,
end_user=litellm._thread_context.user,
- litellm_call_id=self.litellm_params["litellm_call_id"],
+ litellm_call_id=self.
+ litellm_params["litellm_call_id"],
+ print_verbose=print_verbose,
+ )
+ elif callback == "llmonitor":
+ print_verbose("reaches llmonitor for logging!")
+ model = self.model
+ messages = self.messages
+ print(f"liteDebuggerClient: {liteDebuggerClient}")
+ llmonitorLogger.log_event(
+ type="start",
+ model=model,
+ messages=messages,
+ user_id=litellm._thread_context.user,
+ run_id=self.litellm_params["litellm_call_id"],
print_verbose=print_verbose,
)
elif callback == "lite_debugger":
@@ -197,15 +231,18 @@ class Logging:
model=model,
messages=messages,
end_user=litellm._thread_context.user,
- litellm_call_id=self.litellm_params["litellm_call_id"],
+ litellm_call_id=self.
+ litellm_params["litellm_call_id"],
print_verbose=print_verbose,
)
except Exception as e:
- print_verbose(f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}")
+ print_verbose(
+ f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}"
+ )
print_verbose(
f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
)
- if capture_exception: # log this error to sentry for debugging
+ if capture_exception: # log this error to sentry for debugging
capture_exception(e)
except:
print_verbose(
@@ -214,9 +251,9 @@ class Logging:
print_verbose(
f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
)
- if capture_exception: # log this error to sentry for debugging
+ if capture_exception: # log this error to sentry for debugging
capture_exception(e)
-
+
def post_call(self, input, api_key, original_response, additional_args={}):
# Do something here
try:
@@ -224,8 +261,8 @@ class Logging:
self.model_call_details["api_key"] = api_key
self.model_call_details["original_response"] = original_response
self.model_call_details["additional_args"] = additional_args
-
- ## User Logging -> if you pass in a custom logging function
+
+ # User Logging -> if you pass in a custom logging function
print_verbose(
f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
)
@@ -243,9 +280,9 @@ class Logging:
f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
)
pass
-
+
# Add more methods as needed
-
+
def exception_logging(
additional_args={},
@@ -257,7 +294,7 @@ def exception_logging(
if exception:
model_call_details["exception"] = exception
model_call_details["additional_args"] = additional_args
- ## User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs
+ # User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs
print_verbose(
f"Logging Details: logger_fn - {logger_fn} | callable(logger_fn) - {callable(logger_fn)}"
)
@@ -280,20 +317,20 @@ def exception_logging(
####### CLIENT ###################
# make it easy to log if completion/embedding runs succeeded or failed + see what happened | Non-Blocking
def client(original_function):
+
def function_setup(
*args, **kwargs
): # just run once to check if user wants to send their data anywhere - PostHog/Sentry/Slack/etc.
try:
global callback_list, add_breadcrumb, user_logger_fn
- if (
- len(litellm.input_callback) > 0 or len(litellm.success_callback) > 0 or len(litellm.failure_callback) > 0
- ) and len(callback_list) == 0:
+ if (len(litellm.input_callback) > 0
+ or len(litellm.success_callback) > 0
+ or len(litellm.failure_callback)
+ > 0) and len(callback_list) == 0:
callback_list = list(
- set(litellm.input_callback + litellm.success_callback + litellm.failure_callback)
- )
- set_callbacks(
- callback_list=callback_list,
- )
+ set(litellm.input_callback + litellm.success_callback +
+ litellm.failure_callback))
+ set_callbacks(callback_list=callback_list, )
if add_breadcrumb:
add_breadcrumb(
category="litellm.llm_call",
@@ -310,12 +347,11 @@ def client(original_function):
if litellm.telemetry:
try:
model = args[0] if len(args) > 0 else kwargs["model"]
- exception = kwargs["exception"] if "exception" in kwargs else None
- custom_llm_provider = (
- kwargs["custom_llm_provider"]
- if "custom_llm_provider" in kwargs
- else None
- )
+ exception = kwargs[
+ "exception"] if "exception" in kwargs else None
+ custom_llm_provider = (kwargs["custom_llm_provider"]
+ if "custom_llm_provider" in kwargs else
+ None)
safe_crash_reporting(
model=model,
exception=exception,
@@ -340,15 +376,12 @@ def client(original_function):
def check_cache(*args, **kwargs):
try: # never block execution
prompt = get_prompt(*args, **kwargs)
- if (
- prompt != None and prompt in local_cache
- ): # check if messages / prompt exists
+ if (prompt != None and prompt
+ in local_cache): # check if messages / prompt exists
if litellm.caching_with_models:
# if caching with model names is enabled, key is prompt + model name
- if (
- "model" in kwargs
- and kwargs["model"] in local_cache[prompt]["models"]
- ):
+ if ("model" in kwargs and kwargs["model"]
+ in local_cache[prompt]["models"]):
cache_key = prompt + kwargs["model"]
return local_cache[cache_key]
else: # caching only with prompts
@@ -363,10 +396,8 @@ def client(original_function):
try: # never block execution
prompt = get_prompt(*args, **kwargs)
if litellm.caching_with_models: # caching with model + prompt
- if (
- "model" in kwargs
- and kwargs["model"] in local_cache[prompt]["models"]
- ):
+ if ("model" in kwargs
+ and kwargs["model"] in local_cache[prompt]["models"]):
cache_key = prompt + kwargs["model"]
local_cache[cache_key] = result
else: # caching based only on prompts
@@ -381,24 +412,24 @@ def client(original_function):
function_setup(*args, **kwargs)
litellm_call_id = str(uuid.uuid4())
kwargs["litellm_call_id"] = litellm_call_id
- ## [OPTIONAL] CHECK CACHE
+ # [OPTIONAL] CHECK CACHE
start_time = datetime.datetime.now()
if (litellm.caching or litellm.caching_with_models) and (
- cached_result := check_cache(*args, **kwargs)
- ) is not None:
+ cached_result := check_cache(*args, **kwargs)) is not None:
result = cached_result
else:
- ## MODEL CALL
+ # MODEL CALL
result = original_function(*args, **kwargs)
end_time = datetime.datetime.now()
- ## Add response to CACHE
+ # Add response to CACHE
if litellm.caching:
add_cache(result, *args, **kwargs)
- ## LOG SUCCESS
+ # LOG SUCCESS
crash_reporting(*args, **kwargs)
my_thread = threading.Thread(
- target=handle_success, args=(args, kwargs, result, start_time, end_time)
- ) # don't interrupt execution of main thread
+ target=handle_success,
+ args=(args, kwargs, result, start_time,
+ end_time)) # don't interrupt execution of main thread
my_thread.start()
return result
except Exception as e:
@@ -407,7 +438,8 @@ def client(original_function):
end_time = datetime.datetime.now()
my_thread = threading.Thread(
target=handle_failure,
- args=(e, traceback_exception, start_time, end_time, args, kwargs),
+ args=(e, traceback_exception, start_time, end_time, args,
+ kwargs),
) # don't interrupt execution of main thread
my_thread.start()
raise e
@@ -432,18 +464,18 @@ def token_counter(model, text):
return num_tokens
-def cost_per_token(model="gpt-3.5-turbo", prompt_tokens=0, completion_tokens=0):
- ## given
+def cost_per_token(model="gpt-3.5-turbo",
+ prompt_tokens=0,
+ completion_tokens=0):
+ # given
prompt_tokens_cost_usd_dollar = 0
completion_tokens_cost_usd_dollar = 0
model_cost_ref = litellm.model_cost
if model in model_cost_ref:
prompt_tokens_cost_usd_dollar = (
- model_cost_ref[model]["input_cost_per_token"] * prompt_tokens
- )
+ model_cost_ref[model]["input_cost_per_token"] * prompt_tokens)
completion_tokens_cost_usd_dollar = (
- model_cost_ref[model]["output_cost_per_token"] * completion_tokens
- )
+ model_cost_ref[model]["output_cost_per_token"] * completion_tokens)
return prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar
else:
# calculate average input cost
@@ -464,8 +496,9 @@ def completion_cost(model="gpt-3.5-turbo", prompt="", completion=""):
prompt_tokens = token_counter(model=model, text=prompt)
completion_tokens = token_counter(model=model, text=completion)
prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token(
- model=model, prompt_tokens=prompt_tokens, completion_tokens=completion_tokens
- )
+ model=model,
+ prompt_tokens=prompt_tokens,
+ completion_tokens=completion_tokens)
return prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
@@ -557,9 +590,8 @@ def get_optional_params(
optional_params["max_tokens"] = max_tokens
if frequency_penalty != 0:
optional_params["frequency_penalty"] = frequency_penalty
- elif (
- model == "chat-bison"
- ): # chat-bison has diff args from chat-bison@001 ty Google
+ elif (model == "chat-bison"
+ ): # chat-bison has diff args from chat-bison@001 ty Google
if temperature != 1:
optional_params["temperature"] = temperature
if top_p != 1:
@@ -619,7 +651,10 @@ def load_test_model(
test_prompt = prompt
if num_calls:
test_calls = num_calls
- messages = [[{"role": "user", "content": test_prompt}] for _ in range(test_calls)]
+ messages = [[{
+ "role": "user",
+ "content": test_prompt
+ }] for _ in range(test_calls)]
start_time = time.time()
try:
litellm.batch_completion(
@@ -649,7 +684,7 @@ def load_test_model(
def set_callbacks(callback_list):
- global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
+ global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger
try:
for callback in callback_list:
print(f"callback: {callback}")
@@ -657,17 +692,15 @@ def set_callbacks(callback_list):
try:
import sentry_sdk
except ImportError:
- print_verbose("Package 'sentry_sdk' is missing. Installing it...")
+ print_verbose(
+ "Package 'sentry_sdk' is missing. Installing it...")
subprocess.check_call(
- [sys.executable, "-m", "pip", "install", "sentry_sdk"]
- )
+ [sys.executable, "-m", "pip", "install", "sentry_sdk"])
import sentry_sdk
sentry_sdk_instance = sentry_sdk
- sentry_trace_rate = (
- os.environ.get("SENTRY_API_TRACE_RATE")
- if "SENTRY_API_TRACE_RATE" in os.environ
- else "1.0"
- )
+ sentry_trace_rate = (os.environ.get("SENTRY_API_TRACE_RATE")
+ if "SENTRY_API_TRACE_RATE" in os.environ
+ else "1.0")
sentry_sdk_instance.init(
dsn=os.environ.get("SENTRY_API_URL"),
traces_sample_rate=float(sentry_trace_rate),
@@ -678,10 +711,10 @@ def set_callbacks(callback_list):
try:
from posthog import Posthog
except ImportError:
- print_verbose("Package 'posthog' is missing. Installing it...")
+ print_verbose(
+ "Package 'posthog' is missing. Installing it...")
subprocess.check_call(
- [sys.executable, "-m", "pip", "install", "posthog"]
- )
+ [sys.executable, "-m", "pip", "install", "posthog"])
from posthog import Posthog
posthog = Posthog(
project_api_key=os.environ.get("POSTHOG_API_KEY"),
@@ -691,10 +724,10 @@ def set_callbacks(callback_list):
try:
from slack_bolt import App
except ImportError:
- print_verbose("Package 'slack_bolt' is missing. Installing it...")
+ print_verbose(
+ "Package 'slack_bolt' is missing. Installing it...")
subprocess.check_call(
- [sys.executable, "-m", "pip", "install", "slack_bolt"]
- )
+ [sys.executable, "-m", "pip", "install", "slack_bolt"])
from slack_bolt import App
slack_app = App(
token=os.environ.get("SLACK_API_TOKEN"),
@@ -704,6 +737,8 @@ def set_callbacks(callback_list):
print_verbose(f"Initialized Slack App: {slack_app}")
elif callback == "helicone":
heliconeLogger = HeliconeLogger()
+ elif callback == "llmonitor":
+ llmonitorLogger = LLMonitorLogger()
elif callback == "aispend":
aispendLogger = AISpendLogger()
elif callback == "berrispend":
@@ -718,7 +753,8 @@ def set_callbacks(callback_list):
raise e
-def handle_failure(exception, traceback_exception, start_time, end_time, args, kwargs):
+def handle_failure(exception, traceback_exception, start_time, end_time, args,
+ kwargs):
global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
try:
# print_verbose(f"handle_failure args: {args}")
@@ -728,8 +764,7 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
failure_handler = additional_details.pop("failure_handler", None)
additional_details["Event_Name"] = additional_details.pop(
- "failed_event_name", "litellm.failed_query"
- )
+ "failed_event_name", "litellm.failed_query")
print_verbose(f"self.failure_callback: {litellm.failure_callback}")
# print_verbose(f"additional_details: {additional_details}")
@@ -746,9 +781,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
for detail in additional_details:
slack_msg += f"{detail}: {additional_details[detail]}\n"
slack_msg += f"Traceback: {traceback_exception}"
- slack_app.client.chat_postMessage(
- channel=alerts_channel, text=slack_msg
- )
+ slack_app.client.chat_postMessage(channel=alerts_channel,
+ text=slack_msg)
elif callback == "sentry":
capture_exception(exception)
elif callback == "posthog":
@@ -767,9 +801,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
print_verbose(f"ph_obj: {ph_obj}")
print_verbose(f"PostHog Event Name: {event_name}")
if "user_id" in additional_details:
- posthog.capture(
- additional_details["user_id"], event_name, ph_obj
- )
+ posthog.capture(additional_details["user_id"],
+ event_name, ph_obj)
else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
unique_id = str(uuid.uuid4())
posthog.capture(unique_id, event_name)
@@ -783,10 +816,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
"created": time.time(),
"error": traceback_exception,
"usage": {
- "prompt_tokens": prompt_token_calculator(
- model, messages=messages
- ),
- "completion_tokens": 0,
+ "prompt_tokens":
+ prompt_token_calculator(model, messages=messages),
+ "completion_tokens":
+ 0,
},
}
berrispendLogger.log_event(
@@ -805,10 +838,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
"model": model,
"created": time.time(),
"usage": {
- "prompt_tokens": prompt_token_calculator(
- model, messages=messages
- ),
- "completion_tokens": 0,
+ "prompt_tokens":
+ prompt_token_calculator(model, messages=messages),
+ "completion_tokens":
+ 0,
},
}
aispendLogger.log_event(
@@ -818,6 +851,27 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
end_time=end_time,
print_verbose=print_verbose,
)
+ elif callback == "llmonitor":
+ print_verbose("reaches llmonitor for logging!")
+ model = args[0] if len(args) > 0 else kwargs["model"]
+ messages = args[1] if len(args) > 1 else kwargs["messages"]
+ usage = {
+ "prompt_tokens":
+ prompt_token_calculator(model, messages=messages),
+ "completion_tokens":
+ 0,
+ }
+ llmonitorLogger.log_event(
+ type="error",
+ user_id=litellm._thread_context.user,
+ model=model,
+ error=traceback_exception,
+ response_obj=result,
+ run_id=kwargs["litellm_call_id"],
+ timestamp=end_time,
+ usage=usage,
+ print_verbose=print_verbose,
+ )
elif callback == "supabase":
print_verbose("reaches supabase for logging!")
print_verbose(f"supabaseClient: {supabaseClient}")
@@ -828,10 +882,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
"created": time.time(),
"error": traceback_exception,
"usage": {
- "prompt_tokens": prompt_token_calculator(
- model, messages=messages
- ),
- "completion_tokens": 0,
+ "prompt_tokens":
+ prompt_token_calculator(model, messages=messages),
+ "completion_tokens":
+ 0,
},
}
supabaseClient.log_event(
@@ -854,10 +908,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
"created": time.time(),
"error": traceback_exception,
"usage": {
- "prompt_tokens": prompt_token_calculator(
- model, messages=messages
- ),
- "completion_tokens": 0,
+ "prompt_tokens":
+ prompt_token_calculator(model, messages=messages),
+ "completion_tokens":
+ 0,
},
}
liteDebuggerClient.log_event(
@@ -884,19 +938,18 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
failure_handler(call_details)
pass
except Exception as e:
- ## LOGGING
+ # LOGGING
exception_logging(logger_fn=user_logger_fn, exception=e)
pass
def handle_success(args, kwargs, result, start_time, end_time):
- global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient
+ global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger
try:
success_handler = additional_details.pop("success_handler", None)
failure_handler = additional_details.pop("failure_handler", None)
additional_details["Event_Name"] = additional_details.pop(
- "successful_event_name", "litellm.succes_query"
- )
+ "successful_event_name", "litellm.succes_query")
for callback in litellm.success_callback:
try:
if callback == "posthog":
@@ -905,9 +958,8 @@ def handle_success(args, kwargs, result, start_time, end_time):
ph_obj[detail] = additional_details[detail]
event_name = additional_details["Event_Name"]
if "user_id" in additional_details:
- posthog.capture(
- additional_details["user_id"], event_name, ph_obj
- )
+ posthog.capture(additional_details["user_id"],
+ event_name, ph_obj)
else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
unique_id = str(uuid.uuid4())
posthog.capture(unique_id, event_name, ph_obj)
@@ -916,9 +968,8 @@ def handle_success(args, kwargs, result, start_time, end_time):
slack_msg = ""
for detail in additional_details:
slack_msg += f"{detail}: {additional_details[detail]}\n"
- slack_app.client.chat_postMessage(
- channel=alerts_channel, text=slack_msg
- )
+ slack_app.client.chat_postMessage(channel=alerts_channel,
+ text=slack_msg)
elif callback == "helicone":
print_verbose("reaches helicone for logging!")
model = args[0] if len(args) > 0 else kwargs["model"]
@@ -931,6 +982,22 @@ def handle_success(args, kwargs, result, start_time, end_time):
end_time=end_time,
print_verbose=print_verbose,
)
+ elif callback == "llmonitor":
+ print_verbose("reaches llmonitor for logging!")
+ model = args[0] if len(args) > 0 else kwargs["model"]
+ messages = args[1] if len(args) > 1 else kwargs["messages"]
+ usage = kwargs["usage"]
+ llmonitorLogger.log_event(
+ type="end",
+ model=model,
+ messages=messages,
+ user_id=litellm._thread_context.user,
+ response_obj=result,
+ time=end_time,
+ usage=usage,
+ run_id=kwargs["litellm_call_id"],
+ print_verbose=print_verbose,
+ )
elif callback == "aispend":
print_verbose("reaches aispend for logging!")
model = args[0] if len(args) > 0 else kwargs["model"]
@@ -984,7 +1051,7 @@ def handle_success(args, kwargs, result, start_time, end_time):
print_verbose=print_verbose,
)
except Exception as e:
- ## LOGGING
+ # LOGGING
exception_logging(logger_fn=user_logger_fn, exception=e)
print_verbose(
f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
@@ -995,7 +1062,7 @@ def handle_success(args, kwargs, result, start_time, end_time):
success_handler(args, kwargs)
pass
except Exception as e:
- ## LOGGING
+ # LOGGING
exception_logging(logger_fn=user_logger_fn, exception=e)
print_verbose(
f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
@@ -1046,33 +1113,36 @@ def exception_type(model, original_exception, custom_llm_provider):
exception_type = ""
if "claude" in model: # one of the anthropics
if hasattr(original_exception, "status_code"):
- print_verbose(f"status_code: {original_exception.status_code}")
+ print_verbose(
+ f"status_code: {original_exception.status_code}")
if original_exception.status_code == 401:
exception_mapping_worked = True
raise AuthenticationError(
- message=f"AnthropicException - {original_exception.message}",
+ message=
+ f"AnthropicException - {original_exception.message}",
llm_provider="anthropic",
)
elif original_exception.status_code == 400:
exception_mapping_worked = True
raise InvalidRequestError(
- message=f"AnthropicException - {original_exception.message}",
+ message=
+ f"AnthropicException - {original_exception.message}",
model=model,
llm_provider="anthropic",
)
elif original_exception.status_code == 429:
exception_mapping_worked = True
raise RateLimitError(
- message=f"AnthropicException - {original_exception.message}",
+ message=
+ f"AnthropicException - {original_exception.message}",
llm_provider="anthropic",
)
- elif (
- "Could not resolve authentication method. Expected either api_key or auth_token to be set."
- in error_str
- ):
+ elif ("Could not resolve authentication method. Expected either api_key or auth_token to be set."
+ in error_str):
exception_mapping_worked = True
raise AuthenticationError(
- message=f"AnthropicException - {original_exception.message}",
+ message=
+ f"AnthropicException - {original_exception.message}",
llm_provider="anthropic",
)
elif "replicate" in model:
@@ -1096,35 +1166,36 @@ def exception_type(model, original_exception, custom_llm_provider):
llm_provider="replicate",
)
elif (
- exception_type == "ReplicateError"
- ): ## ReplicateError implies an error on Replicate server side, not user side
+ exception_type == "ReplicateError"
+ ): # ReplicateError implies an error on Replicate server side, not user side
raise ServiceUnavailableError(
message=f"ReplicateException - {error_str}",
llm_provider="replicate",
)
elif model == "command-nightly": # Cohere
- if (
- "invalid api token" in error_str
- or "No API key provided." in error_str
- ):
+ if ("invalid api token" in error_str
+ or "No API key provided." in error_str):
exception_mapping_worked = True
raise AuthenticationError(
- message=f"CohereException - {original_exception.message}",
+ message=
+ f"CohereException - {original_exception.message}",
llm_provider="cohere",
)
elif "too many tokens" in error_str:
exception_mapping_worked = True
raise InvalidRequestError(
- message=f"CohereException - {original_exception.message}",
+ message=
+ f"CohereException - {original_exception.message}",
model=model,
llm_provider="cohere",
)
elif (
- "CohereConnectionError" in exception_type
+ "CohereConnectionError" in exception_type
): # cohere seems to fire these errors when we load test it (1k+ messages / min)
exception_mapping_worked = True
raise RateLimitError(
- message=f"CohereException - {original_exception.message}",
+ message=
+ f"CohereException - {original_exception.message}",
llm_provider="cohere",
)
elif custom_llm_provider == "huggingface":
@@ -1132,27 +1203,30 @@ def exception_type(model, original_exception, custom_llm_provider):
if original_exception.status_code == 401:
exception_mapping_worked = True
raise AuthenticationError(
- message=f"HuggingfaceException - {original_exception.message}",
+ message=
+ f"HuggingfaceException - {original_exception.message}",
llm_provider="huggingface",
)
elif original_exception.status_code == 400:
exception_mapping_worked = True
raise InvalidRequestError(
- message=f"HuggingfaceException - {original_exception.message}",
+ message=
+ f"HuggingfaceException - {original_exception.message}",
model=model,
llm_provider="huggingface",
)
elif original_exception.status_code == 429:
exception_mapping_worked = True
raise RateLimitError(
- message=f"HuggingfaceException - {original_exception.message}",
+ message=
+ f"HuggingfaceException - {original_exception.message}",
llm_provider="huggingface",
)
raise original_exception # base case - return the original exception
else:
raise original_exception
except Exception as e:
- ## LOGGING
+ # LOGGING
exception_logging(
logger_fn=user_logger_fn,
additional_args={
@@ -1173,7 +1247,7 @@ def safe_crash_reporting(model=None, exception=None, custom_llm_provider=None):
"exception": str(exception),
"custom_llm_provider": custom_llm_provider,
}
- threading.Thread(target=litellm_telemetry, args=(data,)).start()
+ threading.Thread(target=litellm_telemetry, args=(data, )).start()
def litellm_telemetry(data):
@@ -1223,11 +1297,13 @@ def get_secret(secret_name):
if litellm.secret_manager_client != None:
# TODO: check which secret manager is being used
# currently only supports Infisical
- secret = litellm.secret_manager_client.get_secret(secret_name).secret_value
+ secret = litellm.secret_manager_client.get_secret(
+ secret_name).secret_value
if secret != None:
return secret # if secret found in secret manager return it
else:
- raise ValueError(f"Secret '{secret_name}' not found in secret manager")
+ raise ValueError(
+ f"Secret '{secret_name}' not found in secret manager")
elif litellm.api_key != None: # if users use litellm default key
return litellm.api_key
else:
@@ -1238,6 +1314,7 @@ def get_secret(secret_name):
# wraps the completion stream to return the correct format for the model
# replicate/anthropic/cohere
class CustomStreamWrapper:
+
def __init__(self, completion_stream, model, custom_llm_provider=None):
self.model = model
self.custom_llm_provider = custom_llm_provider
@@ -1288,7 +1365,8 @@ class CustomStreamWrapper:
elif self.model == "replicate":
chunk = next(self.completion_stream)
completion_obj["content"] = chunk
- elif (self.model == "together_ai") or ("togethercomputer" in self.model):
+ elif (self.model == "together_ai") or ("togethercomputer"
+ in self.model):
chunk = next(self.completion_stream)
text_data = self.handle_together_ai_chunk(chunk)
if text_data == "":
@@ -1321,12 +1399,11 @@ def read_config_args(config_path):
########## ollama implementation ############################
-import aiohttp
-async def get_ollama_response_stream(
- api_base="http://localhost:11434", model="llama2", prompt="Why is the sky blue?"
-):
+async def get_ollama_response_stream(api_base="http://localhost:11434",
+ model="llama2",
+ prompt="Why is the sky blue?"):
session = aiohttp.ClientSession()
url = f"{api_base}/api/generate"
data = {
@@ -1349,7 +1426,11 @@ async def get_ollama_response_stream(
"content": "",
}
completion_obj["content"] = j["response"]
- yield {"choices": [{"delta": completion_obj}]}
+ yield {
+ "choices": [{
+ "delta": completion_obj
+ }]
+ }
# self.responses.append(j["response"])
# yield "blank"
except Exception as e:
diff --git a/proxy-server/readme.md b/proxy-server/readme.md
index 4f735f38c..edd03de3f 100644
--- a/proxy-server/readme.md
+++ b/proxy-server/readme.md
@@ -1,6 +1,7 @@
-
# liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
+
### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
+
[](https://pypi.org/project/litellm/)
[](https://pypi.org/project/litellm/0.1.1/)

@@ -11,34 +12,36 @@

## What does liteLLM proxy do
+
- Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
-
+
Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
```json
{
"model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
"messages": [
- {
- "content": "Hello, whats the weather in San Francisco??",
- "role": "user"
- }
- ]
+ {
+ "content": "Hello, whats the weather in San Francisco??",
+ "role": "user"
+ }
+ ]
}
```
-- **Consistent Input/Output** Format
- - Call all models using the OpenAI format - `completion(model, messages)`
- - Text responses will always be available at `['choices'][0]['message']['content']`
-- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
-- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
- **Example: Logs sent to Supabase**
+- **Consistent Input/Output** Format
+ - Call all models using the OpenAI format - `completion(model, messages)`
+ - Text responses will always be available at `['choices'][0]['message']['content']`
+- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
+- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone`, `LLMonitor` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
+
+ **Example: Logs sent to Supabase**
- **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
- **Caching** - Implementation of Semantic Caching
- **Streaming & Async Support** - Return generators to stream text responses
-
## API Endpoints
### `/chat/completions` (POST)
@@ -46,34 +49,37 @@
This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
#### Input
+
This API endpoint accepts all inputs in raw JSON and expects the following inputs
-- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
- eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
+- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
+ eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
- `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
- Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
-
#### Example JSON body
+
For claude-2
+
```json
{
- "model": "claude-2",
- "messages": [
- {
- "content": "Hello, whats the weather in San Francisco??",
- "role": "user"
- }
- ]
-
+ "model": "claude-2",
+ "messages": [
+ {
+ "content": "Hello, whats the weather in San Francisco??",
+ "role": "user"
+ }
+ ]
}
```
### Making an API request to the Proxy Server
+
```python
import requests
import json
-# TODO: use your URL
+# TODO: use your URL
url = "http://localhost:5000/chat/completions"
payload = json.dumps({
@@ -94,34 +100,38 @@ print(response.text)
```
### Output [Response Format]
-Responses from the server are given in the following format.
+
+Responses from the server are given in the following format.
All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
+
```json
{
- "choices": [
- {
- "finish_reason": "stop",
- "index": 0,
- "message": {
- "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
- "role": "assistant"
- }
- }
- ],
- "created": 1691790381,
- "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
- "model": "gpt-3.5-turbo-0613",
- "object": "chat.completion",
- "usage": {
- "completion_tokens": 41,
- "prompt_tokens": 16,
- "total_tokens": 57
+ "choices": [
+ {
+ "finish_reason": "stop",
+ "index": 0,
+ "message": {
+ "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
+ "role": "assistant"
+ }
}
+ ],
+ "created": 1691790381,
+ "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
+ "model": "gpt-3.5-turbo-0613",
+ "object": "chat.completion",
+ "usage": {
+ "completion_tokens": 41,
+ "prompt_tokens": 16,
+ "total_tokens": 57
+ }
}
```
## Installation & Usage
+
### Running Locally
+
1. Clone liteLLM repository to your local machine:
```
git clone https://github.com/BerriAI/liteLLM-proxy
@@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM
python main.py
```
-
-
## Deploying
+
1. Quick Start: Deploy on Railway
[](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
-
-2. `GCP`, `AWS`, `Azure`
-This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
+
+2. `GCP`, `AWS`, `Azure`
+ This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
# Support / Talk with founders
+
- [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
- [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
- Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
- Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
-
## Roadmap
+
- [ ] Support hosted db (e.g. Supabase)
- [ ] Easily send data to places like posthog and sentry.
- [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings