diff --git a/cookbook/proxy-server/readme.md b/cookbook/proxy-server/readme.md index 4f735f38c..bb9e00804 100644 --- a/cookbook/proxy-server/readme.md +++ b/cookbook/proxy-server/readme.md @@ -1,6 +1,7 @@ - # liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching + ### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models + [![PyPI Version](https://img.shields.io/pypi/v/litellm.svg)](https://pypi.org/project/litellm/) [![PyPI Version](https://img.shields.io/badge/stable%20version-v0.1.345-blue?color=green&link=https://pypi.org/project/litellm/0.1.1/)](https://pypi.org/project/litellm/0.1.1/) ![Downloads](https://img.shields.io/pypi/dm/litellm) @@ -11,34 +12,36 @@ ![4BC6491E-86D0-4833-B061-9F54524B2579](https://github.com/BerriAI/litellm/assets/17561003/f5dd237b-db5e-42e1-b1ac-f05683b1d724) ## What does liteLLM proxy do + - Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face** - + Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k` + ```json { "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1", "messages": [ - { - "content": "Hello, whats the weather in San Francisco??", - "role": "user" - } - ] + { + "content": "Hello, whats the weather in San Francisco??", + "role": "user" + } + ] } ``` -- **Consistent Input/Output** Format - - Call all models using the OpenAI format - `completion(model, messages)` - - Text responses will always be available at `['choices'][0]['message']['content']` -- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`) -- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ - **Example: Logs sent to Supabase** +- **Consistent Input/Output** Format + - Call all models using the OpenAI format - `completion(model, messages)` + - Text responses will always be available at `['choices'][0]['message']['content']` +- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`) +- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `LLMonitor,` `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ + + **Example: Logs sent to Supabase** Screenshot 2023-08-11 at 4 02 46 PM - **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model - **Caching** - Implementation of Semantic Caching - **Streaming & Async Support** - Return generators to stream text responses - ## API Endpoints ### `/chat/completions` (POST) @@ -46,34 +49,37 @@ This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc #### Input + This API endpoint accepts all inputs in raw JSON and expects the following inputs -- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): - eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k` + +- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): + eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k` - `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role). - Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/ - #### Example JSON body + For claude-2 + ```json { - "model": "claude-2", - "messages": [ - { - "content": "Hello, whats the weather in San Francisco??", - "role": "user" - } - ] - + "model": "claude-2", + "messages": [ + { + "content": "Hello, whats the weather in San Francisco??", + "role": "user" + } + ] } ``` ### Making an API request to the Proxy Server + ```python import requests import json -# TODO: use your URL +# TODO: use your URL url = "http://localhost:5000/chat/completions" payload = json.dumps({ @@ -94,34 +100,38 @@ print(response.text) ``` ### Output [Response Format] -Responses from the server are given in the following format. + +Responses from the server are given in the following format. All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/ + ```json { - "choices": [ - { - "finish_reason": "stop", - "index": 0, - "message": { - "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.", - "role": "assistant" - } - } - ], - "created": 1691790381, - "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb", - "model": "gpt-3.5-turbo-0613", - "object": "chat.completion", - "usage": { - "completion_tokens": 41, - "prompt_tokens": 16, - "total_tokens": 57 + "choices": [ + { + "finish_reason": "stop", + "index": 0, + "message": { + "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.", + "role": "assistant" + } } + ], + "created": 1691790381, + "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb", + "model": "gpt-3.5-turbo-0613", + "object": "chat.completion", + "usage": { + "completion_tokens": 41, + "prompt_tokens": 16, + "total_tokens": 57 + } } ``` ## Installation & Usage + ### Running Locally + 1. Clone liteLLM repository to your local machine: ``` git clone https://github.com/BerriAI/liteLLM-proxy @@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM python main.py ``` - - ## Deploying + 1. Quick Start: Deploy on Railway [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/DYqQAW?referralCode=t3ukrU) - -2. `GCP`, `AWS`, `Azure` -This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers + +2. `GCP`, `AWS`, `Azure` + This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers # Support / Talk with founders + - [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) - [Community Discord 💭](https://discord.gg/wuPM9dRgDw) - Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238 - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai - ## Roadmap + - [ ] Support hosted db (e.g. Supabase) - [ ] Easily send data to places like posthog and sentry. - [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings diff --git a/litellm/integrations/llmonitor.py b/litellm/integrations/llmonitor.py index e7430b5bb..b2940e872 100644 --- a/litellm/integrations/llmonitor.py +++ b/litellm/integrations/llmonitor.py @@ -5,6 +5,7 @@ import traceback import dotenv import os import requests + dotenv.load_dotenv() # Loading env variables using dotenv @@ -14,45 +15,34 @@ class LLMonitorLogger: # Instance variables self.api_url = os.getenv( "LLMONITOR_API_URL") or "https://app.llmonitor.com" - self.account_id = os.getenv("LLMONITOR_APP_ID") + self.app_id = os.getenv("LLMONITOR_APP_ID") - def log_event(self, model, messages, response_obj, start_time, end_time, print_verbose): + def log_event(self, type, run_id, error, usage, model, messages, + response_obj, user_id, time, print_verbose): # Method definition try: print_verbose( - f"LLMonitor Logging - Enters logging function for model {model}") + f"LLMonitor Logging - Enters logging function for model {model}" + ) - print(model, messages, response_obj, start_time, end_time) + print(type, model, messages, response_obj, time, end_user) - # headers = { - # 'Content-Type': 'application/json' - # } + headers = {'Content-Type': 'application/json'} - # prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = self.price_calculator( - # model, response_obj, start_time, end_time) - # total_cost = prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar + data = { + "type": "llm", + "name": model, + "runId": run_id, + "app": self.app_id, + "error": error, + "event": type, + "timestamp": time.isoformat(), + "userId": user_id, + "input": messages, + "output": response_obj['choices'][0]['message']['content'], + } - # response_time = (end_time-start_time).total_seconds() - # if "response" in response_obj: - # data = [{ - # "response_time": response_time, - # "model_id": response_obj["model"], - # "total_cost": total_cost, - # "messages": messages, - # "response": response_obj['choices'][0]['message']['content'], - # "account_id": self.account_id - # }] - # elif "error" in response_obj: - # data = [{ - # "response_time": response_time, - # "model_id": response_obj["model"], - # "total_cost": total_cost, - # "messages": messages, - # "error": response_obj['error'], - # "account_id": self.account_id - # }] - - # print_verbose(f"BerriSpend Logging - final data object: {data}") + print_verbose(f"LLMonitor Logging - final data object: {data}") # response = requests.post(url, headers=headers, json=data) except: # traceback.print_exc() diff --git a/litellm/tests/test_llmonitor_integration.py b/litellm/tests/test_llmonitor_integration.py index 5a4b4beb3..d2045e5dc 100644 --- a/litellm/tests/test_llmonitor_integration.py +++ b/litellm/tests/test_llmonitor_integration.py @@ -1,28 +1,36 @@ #### What this tests #### -# This tests if logging to the helicone integration actually works - -from litellm import embedding, completion -import litellm +# This tests if logging to the llmonitor integration actually works +# Adds the parent directory to the system path import sys import os -import traceback -import pytest -# Adds the parent directory to the system path sys.path.insert(0, os.path.abspath('../..')) +from litellm import completion +import litellm + +litellm.input_callback = ["llmonitor"] litellm.success_callback = ["llmonitor"] +litellm.error_callback = ["llmonitor"] litellm.set_verbose = True -user_message = "Hello, how are you?" -messages = [{"content": user_message, "role": "user"}] - - # openai call -response = completion(model="gpt-3.5-turbo", - messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}]) +# response = completion(model="gpt-3.5-turbo", +# messages=[{ +# "role": "user", +# "content": "Hi 👋 - i'm openai" +# }]) + +# print(response) + +# #bad request call +# response = completion(model="chatgpt-test", messages=[{"role": "user", "content": "Hi 👋 - i'm a bad request"}]) # cohere call -# response = completion(model="command-nightly", -# messages=[{"role": "user", "content": "Hi 👋 - i'm cohere"}]) +response = completion(model="command-nightly", + messages=[{ + "role": "user", + "content": "Hi 👋 - i'm cohere" + }]) +print(response) diff --git a/litellm/utils.py b/litellm/utils.py index e47b55978..c451095b8 100644 --- a/litellm/utils.py +++ b/litellm/utils.py @@ -1,20 +1,7 @@ -import sys -import dotenv, json, traceback, threading -import subprocess, os -import litellm, openai -import random, uuid, requests -import datetime, time -import tiktoken - -encoding = tiktoken.get_encoding("cl100k_base") -import pkg_resources -from .integrations.helicone import HeliconeLogger -from .integrations.aispend import AISpendLogger -from .integrations.berrispend import BerriSpendLogger -from .integrations.supabase import Supabase -from .integrations.litedebugger import LiteDebugger -from openai.error import OpenAIError as OriginalError -from openai.openai_object import OpenAIObject +import aiohttp +import subprocess +import importlib +from typing import List, Dict, Union, Optional from .exceptions import ( AuthenticationError, InvalidRequestError, @@ -22,7 +9,32 @@ from .exceptions import ( ServiceUnavailableError, OpenAIError, ) -from typing import List, Dict, Union, Optional +from openai.openai_object import OpenAIObject +from openai.error import OpenAIError as OriginalError +from .integrations.llmonitor import LLMonitorLogger +from .integrations.litedebugger import LiteDebugger +from .integrations.supabase import Supabase +from .integrations.berrispend import BerriSpendLogger +from .integrations.aispend import AISpendLogger +from .integrations.helicone import HeliconeLogger +import pkg_resources +import sys +import dotenv +import json +import traceback +import threading +import subprocess +import os +import litellm +import openai +import random +import uuid +import requests +import datetime +import time +import tiktoken + +encoding = tiktoken.get_encoding("cl100k_base") ####### ENVIRONMENT VARIABLES ################### dotenv.load_dotenv() # Loading env variables using dotenv @@ -37,6 +49,7 @@ aispendLogger = None berrispendLogger = None supabaseClient = None liteDebuggerClient = None +llmonitorLogger = None callback_list: Optional[List[str]] = [] user_logger_fn = None additional_details: Optional[Dict[str, str]] = {} @@ -63,6 +76,7 @@ local_cache: Optional[Dict[str, str]] = {} class Message(OpenAIObject): + def __init__(self, content="default", role="assistant", **params): super(Message, self).__init__(**params) self.content = content @@ -70,7 +84,12 @@ class Message(OpenAIObject): class Choices(OpenAIObject): - def __init__(self, finish_reason="stop", index=0, message=Message(), **params): + + def __init__(self, + finish_reason="stop", + index=0, + message=Message(), + **params): super(Choices, self).__init__(**params) self.finish_reason = finish_reason self.index = index @@ -78,20 +97,22 @@ class Choices(OpenAIObject): class ModelResponse(OpenAIObject): - def __init__(self, choices=None, created=None, model=None, usage=None, **params): + + def __init__(self, + choices=None, + created=None, + model=None, + usage=None, + **params): super(ModelResponse, self).__init__(**params) self.choices = choices if choices else [Choices()] self.created = created self.model = model - self.usage = ( - usage - if usage - else { - "prompt_tokens": None, - "completion_tokens": None, - "total_tokens": None, - } - ) + self.usage = (usage if usage else { + "prompt_tokens": None, + "completion_tokens": None, + "total_tokens": None, + }) def to_dict_recursive(self): d = super().to_dict_recursive() @@ -108,8 +129,6 @@ def print_verbose(print_statement): ####### Package Import Handler ################### -import importlib -import subprocess def install_and_import(package: str): @@ -139,6 +158,7 @@ def install_and_import(package: str): # Logging function -> log the exact model details + what's being sent | Non-Blocking class Logging: global supabaseClient, liteDebuggerClient + def __init__(self, model, messages, optional_params, litellm_params): self.model = model self.messages = messages @@ -146,20 +166,20 @@ class Logging: self.litellm_params = litellm_params self.logger_fn = litellm_params["logger_fn"] self.model_call_details = { - "model": model, - "messages": messages, + "model": model, + "messages": messages, "optional_params": self.optional_params, "litellm_params": self.litellm_params, } - + def pre_call(self, input, api_key, additional_args={}): try: print(f"logging pre call for model: {self.model}") self.model_call_details["input"] = input self.model_call_details["api_key"] = api_key self.model_call_details["additional_args"] = additional_args - - ## User Logging -> if you pass in a custom logging function + + # User Logging -> if you pass in a custom logging function print_verbose( f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}" ) @@ -173,7 +193,7 @@ class Logging: f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}" ) - ## Input Integration Logging -> If you want to log the fact that an attempt to call the model was made + # Input Integration Logging -> If you want to log the fact that an attempt to call the model was made for callback in litellm.input_callback: try: if callback == "supabase": @@ -185,7 +205,21 @@ class Logging: model=model, messages=messages, end_user=litellm._thread_context.user, - litellm_call_id=self.litellm_params["litellm_call_id"], + litellm_call_id=self. + litellm_params["litellm_call_id"], + print_verbose=print_verbose, + ) + elif callback == "llmonitor": + print_verbose("reaches llmonitor for logging!") + model = self.model + messages = self.messages + print(f"liteDebuggerClient: {liteDebuggerClient}") + llmonitorLogger.log_event( + type="start", + model=model, + messages=messages, + user_id=litellm._thread_context.user, + run_id=self.litellm_params["litellm_call_id"], print_verbose=print_verbose, ) elif callback == "lite_debugger": @@ -197,15 +231,18 @@ class Logging: model=model, messages=messages, end_user=litellm._thread_context.user, - litellm_call_id=self.litellm_params["litellm_call_id"], + litellm_call_id=self. + litellm_params["litellm_call_id"], print_verbose=print_verbose, ) except Exception as e: - print_verbose(f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}") + print_verbose( + f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}" + ) print_verbose( f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}" ) - if capture_exception: # log this error to sentry for debugging + if capture_exception: # log this error to sentry for debugging capture_exception(e) except: print_verbose( @@ -214,9 +251,9 @@ class Logging: print_verbose( f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}" ) - if capture_exception: # log this error to sentry for debugging + if capture_exception: # log this error to sentry for debugging capture_exception(e) - + def post_call(self, input, api_key, original_response, additional_args={}): # Do something here try: @@ -224,8 +261,8 @@ class Logging: self.model_call_details["api_key"] = api_key self.model_call_details["original_response"] = original_response self.model_call_details["additional_args"] = additional_args - - ## User Logging -> if you pass in a custom logging function + + # User Logging -> if you pass in a custom logging function print_verbose( f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}" ) @@ -243,9 +280,9 @@ class Logging: f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}" ) pass - + # Add more methods as needed - + def exception_logging( additional_args={}, @@ -257,7 +294,7 @@ def exception_logging( if exception: model_call_details["exception"] = exception model_call_details["additional_args"] = additional_args - ## User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs + # User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs print_verbose( f"Logging Details: logger_fn - {logger_fn} | callable(logger_fn) - {callable(logger_fn)}" ) @@ -280,20 +317,20 @@ def exception_logging( ####### CLIENT ################### # make it easy to log if completion/embedding runs succeeded or failed + see what happened | Non-Blocking def client(original_function): + def function_setup( *args, **kwargs ): # just run once to check if user wants to send their data anywhere - PostHog/Sentry/Slack/etc. try: global callback_list, add_breadcrumb, user_logger_fn - if ( - len(litellm.input_callback) > 0 or len(litellm.success_callback) > 0 or len(litellm.failure_callback) > 0 - ) and len(callback_list) == 0: + if (len(litellm.input_callback) > 0 + or len(litellm.success_callback) > 0 + or len(litellm.failure_callback) + > 0) and len(callback_list) == 0: callback_list = list( - set(litellm.input_callback + litellm.success_callback + litellm.failure_callback) - ) - set_callbacks( - callback_list=callback_list, - ) + set(litellm.input_callback + litellm.success_callback + + litellm.failure_callback)) + set_callbacks(callback_list=callback_list, ) if add_breadcrumb: add_breadcrumb( category="litellm.llm_call", @@ -310,12 +347,11 @@ def client(original_function): if litellm.telemetry: try: model = args[0] if len(args) > 0 else kwargs["model"] - exception = kwargs["exception"] if "exception" in kwargs else None - custom_llm_provider = ( - kwargs["custom_llm_provider"] - if "custom_llm_provider" in kwargs - else None - ) + exception = kwargs[ + "exception"] if "exception" in kwargs else None + custom_llm_provider = (kwargs["custom_llm_provider"] + if "custom_llm_provider" in kwargs else + None) safe_crash_reporting( model=model, exception=exception, @@ -340,15 +376,12 @@ def client(original_function): def check_cache(*args, **kwargs): try: # never block execution prompt = get_prompt(*args, **kwargs) - if ( - prompt != None and prompt in local_cache - ): # check if messages / prompt exists + if (prompt != None and prompt + in local_cache): # check if messages / prompt exists if litellm.caching_with_models: # if caching with model names is enabled, key is prompt + model name - if ( - "model" in kwargs - and kwargs["model"] in local_cache[prompt]["models"] - ): + if ("model" in kwargs and kwargs["model"] + in local_cache[prompt]["models"]): cache_key = prompt + kwargs["model"] return local_cache[cache_key] else: # caching only with prompts @@ -363,10 +396,8 @@ def client(original_function): try: # never block execution prompt = get_prompt(*args, **kwargs) if litellm.caching_with_models: # caching with model + prompt - if ( - "model" in kwargs - and kwargs["model"] in local_cache[prompt]["models"] - ): + if ("model" in kwargs + and kwargs["model"] in local_cache[prompt]["models"]): cache_key = prompt + kwargs["model"] local_cache[cache_key] = result else: # caching based only on prompts @@ -381,24 +412,24 @@ def client(original_function): function_setup(*args, **kwargs) litellm_call_id = str(uuid.uuid4()) kwargs["litellm_call_id"] = litellm_call_id - ## [OPTIONAL] CHECK CACHE + # [OPTIONAL] CHECK CACHE start_time = datetime.datetime.now() if (litellm.caching or litellm.caching_with_models) and ( - cached_result := check_cache(*args, **kwargs) - ) is not None: + cached_result := check_cache(*args, **kwargs)) is not None: result = cached_result else: - ## MODEL CALL + # MODEL CALL result = original_function(*args, **kwargs) end_time = datetime.datetime.now() - ## Add response to CACHE + # Add response to CACHE if litellm.caching: add_cache(result, *args, **kwargs) - ## LOG SUCCESS + # LOG SUCCESS crash_reporting(*args, **kwargs) my_thread = threading.Thread( - target=handle_success, args=(args, kwargs, result, start_time, end_time) - ) # don't interrupt execution of main thread + target=handle_success, + args=(args, kwargs, result, start_time, + end_time)) # don't interrupt execution of main thread my_thread.start() return result except Exception as e: @@ -407,7 +438,8 @@ def client(original_function): end_time = datetime.datetime.now() my_thread = threading.Thread( target=handle_failure, - args=(e, traceback_exception, start_time, end_time, args, kwargs), + args=(e, traceback_exception, start_time, end_time, args, + kwargs), ) # don't interrupt execution of main thread my_thread.start() raise e @@ -432,18 +464,18 @@ def token_counter(model, text): return num_tokens -def cost_per_token(model="gpt-3.5-turbo", prompt_tokens=0, completion_tokens=0): - ## given +def cost_per_token(model="gpt-3.5-turbo", + prompt_tokens=0, + completion_tokens=0): + # given prompt_tokens_cost_usd_dollar = 0 completion_tokens_cost_usd_dollar = 0 model_cost_ref = litellm.model_cost if model in model_cost_ref: prompt_tokens_cost_usd_dollar = ( - model_cost_ref[model]["input_cost_per_token"] * prompt_tokens - ) + model_cost_ref[model]["input_cost_per_token"] * prompt_tokens) completion_tokens_cost_usd_dollar = ( - model_cost_ref[model]["output_cost_per_token"] * completion_tokens - ) + model_cost_ref[model]["output_cost_per_token"] * completion_tokens) return prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar else: # calculate average input cost @@ -464,8 +496,9 @@ def completion_cost(model="gpt-3.5-turbo", prompt="", completion=""): prompt_tokens = token_counter(model=model, text=prompt) completion_tokens = token_counter(model=model, text=completion) prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token( - model=model, prompt_tokens=prompt_tokens, completion_tokens=completion_tokens - ) + model=model, + prompt_tokens=prompt_tokens, + completion_tokens=completion_tokens) return prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar @@ -557,9 +590,8 @@ def get_optional_params( optional_params["max_tokens"] = max_tokens if frequency_penalty != 0: optional_params["frequency_penalty"] = frequency_penalty - elif ( - model == "chat-bison" - ): # chat-bison has diff args from chat-bison@001 ty Google + elif (model == "chat-bison" + ): # chat-bison has diff args from chat-bison@001 ty Google if temperature != 1: optional_params["temperature"] = temperature if top_p != 1: @@ -619,7 +651,10 @@ def load_test_model( test_prompt = prompt if num_calls: test_calls = num_calls - messages = [[{"role": "user", "content": test_prompt}] for _ in range(test_calls)] + messages = [[{ + "role": "user", + "content": test_prompt + }] for _ in range(test_calls)] start_time = time.time() try: litellm.batch_completion( @@ -649,7 +684,7 @@ def load_test_model( def set_callbacks(callback_list): - global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient + global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger try: for callback in callback_list: print(f"callback: {callback}") @@ -657,17 +692,15 @@ def set_callbacks(callback_list): try: import sentry_sdk except ImportError: - print_verbose("Package 'sentry_sdk' is missing. Installing it...") + print_verbose( + "Package 'sentry_sdk' is missing. Installing it...") subprocess.check_call( - [sys.executable, "-m", "pip", "install", "sentry_sdk"] - ) + [sys.executable, "-m", "pip", "install", "sentry_sdk"]) import sentry_sdk sentry_sdk_instance = sentry_sdk - sentry_trace_rate = ( - os.environ.get("SENTRY_API_TRACE_RATE") - if "SENTRY_API_TRACE_RATE" in os.environ - else "1.0" - ) + sentry_trace_rate = (os.environ.get("SENTRY_API_TRACE_RATE") + if "SENTRY_API_TRACE_RATE" in os.environ + else "1.0") sentry_sdk_instance.init( dsn=os.environ.get("SENTRY_API_URL"), traces_sample_rate=float(sentry_trace_rate), @@ -678,10 +711,10 @@ def set_callbacks(callback_list): try: from posthog import Posthog except ImportError: - print_verbose("Package 'posthog' is missing. Installing it...") + print_verbose( + "Package 'posthog' is missing. Installing it...") subprocess.check_call( - [sys.executable, "-m", "pip", "install", "posthog"] - ) + [sys.executable, "-m", "pip", "install", "posthog"]) from posthog import Posthog posthog = Posthog( project_api_key=os.environ.get("POSTHOG_API_KEY"), @@ -691,10 +724,10 @@ def set_callbacks(callback_list): try: from slack_bolt import App except ImportError: - print_verbose("Package 'slack_bolt' is missing. Installing it...") + print_verbose( + "Package 'slack_bolt' is missing. Installing it...") subprocess.check_call( - [sys.executable, "-m", "pip", "install", "slack_bolt"] - ) + [sys.executable, "-m", "pip", "install", "slack_bolt"]) from slack_bolt import App slack_app = App( token=os.environ.get("SLACK_API_TOKEN"), @@ -704,6 +737,8 @@ def set_callbacks(callback_list): print_verbose(f"Initialized Slack App: {slack_app}") elif callback == "helicone": heliconeLogger = HeliconeLogger() + elif callback == "llmonitor": + llmonitorLogger = LLMonitorLogger() elif callback == "aispend": aispendLogger = AISpendLogger() elif callback == "berrispend": @@ -718,7 +753,8 @@ def set_callbacks(callback_list): raise e -def handle_failure(exception, traceback_exception, start_time, end_time, args, kwargs): +def handle_failure(exception, traceback_exception, start_time, end_time, args, + kwargs): global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient try: # print_verbose(f"handle_failure args: {args}") @@ -728,8 +764,7 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k failure_handler = additional_details.pop("failure_handler", None) additional_details["Event_Name"] = additional_details.pop( - "failed_event_name", "litellm.failed_query" - ) + "failed_event_name", "litellm.failed_query") print_verbose(f"self.failure_callback: {litellm.failure_callback}") # print_verbose(f"additional_details: {additional_details}") @@ -746,9 +781,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k for detail in additional_details: slack_msg += f"{detail}: {additional_details[detail]}\n" slack_msg += f"Traceback: {traceback_exception}" - slack_app.client.chat_postMessage( - channel=alerts_channel, text=slack_msg - ) + slack_app.client.chat_postMessage(channel=alerts_channel, + text=slack_msg) elif callback == "sentry": capture_exception(exception) elif callback == "posthog": @@ -767,9 +801,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k print_verbose(f"ph_obj: {ph_obj}") print_verbose(f"PostHog Event Name: {event_name}") if "user_id" in additional_details: - posthog.capture( - additional_details["user_id"], event_name, ph_obj - ) + posthog.capture(additional_details["user_id"], + event_name, ph_obj) else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python unique_id = str(uuid.uuid4()) posthog.capture(unique_id, event_name) @@ -783,10 +816,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k "created": time.time(), "error": traceback_exception, "usage": { - "prompt_tokens": prompt_token_calculator( - model, messages=messages - ), - "completion_tokens": 0, + "prompt_tokens": + prompt_token_calculator(model, messages=messages), + "completion_tokens": + 0, }, } berrispendLogger.log_event( @@ -805,10 +838,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k "model": model, "created": time.time(), "usage": { - "prompt_tokens": prompt_token_calculator( - model, messages=messages - ), - "completion_tokens": 0, + "prompt_tokens": + prompt_token_calculator(model, messages=messages), + "completion_tokens": + 0, }, } aispendLogger.log_event( @@ -818,6 +851,27 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k end_time=end_time, print_verbose=print_verbose, ) + elif callback == "llmonitor": + print_verbose("reaches llmonitor for logging!") + model = args[0] if len(args) > 0 else kwargs["model"] + messages = args[1] if len(args) > 1 else kwargs["messages"] + usage = { + "prompt_tokens": + prompt_token_calculator(model, messages=messages), + "completion_tokens": + 0, + } + llmonitorLogger.log_event( + type="error", + user_id=litellm._thread_context.user, + model=model, + error=traceback_exception, + response_obj=result, + run_id=kwargs["litellm_call_id"], + timestamp=end_time, + usage=usage, + print_verbose=print_verbose, + ) elif callback == "supabase": print_verbose("reaches supabase for logging!") print_verbose(f"supabaseClient: {supabaseClient}") @@ -828,10 +882,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k "created": time.time(), "error": traceback_exception, "usage": { - "prompt_tokens": prompt_token_calculator( - model, messages=messages - ), - "completion_tokens": 0, + "prompt_tokens": + prompt_token_calculator(model, messages=messages), + "completion_tokens": + 0, }, } supabaseClient.log_event( @@ -854,10 +908,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k "created": time.time(), "error": traceback_exception, "usage": { - "prompt_tokens": prompt_token_calculator( - model, messages=messages - ), - "completion_tokens": 0, + "prompt_tokens": + prompt_token_calculator(model, messages=messages), + "completion_tokens": + 0, }, } liteDebuggerClient.log_event( @@ -884,19 +938,18 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k failure_handler(call_details) pass except Exception as e: - ## LOGGING + # LOGGING exception_logging(logger_fn=user_logger_fn, exception=e) pass def handle_success(args, kwargs, result, start_time, end_time): - global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient + global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger try: success_handler = additional_details.pop("success_handler", None) failure_handler = additional_details.pop("failure_handler", None) additional_details["Event_Name"] = additional_details.pop( - "successful_event_name", "litellm.succes_query" - ) + "successful_event_name", "litellm.succes_query") for callback in litellm.success_callback: try: if callback == "posthog": @@ -905,9 +958,8 @@ def handle_success(args, kwargs, result, start_time, end_time): ph_obj[detail] = additional_details[detail] event_name = additional_details["Event_Name"] if "user_id" in additional_details: - posthog.capture( - additional_details["user_id"], event_name, ph_obj - ) + posthog.capture(additional_details["user_id"], + event_name, ph_obj) else: # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python unique_id = str(uuid.uuid4()) posthog.capture(unique_id, event_name, ph_obj) @@ -916,9 +968,8 @@ def handle_success(args, kwargs, result, start_time, end_time): slack_msg = "" for detail in additional_details: slack_msg += f"{detail}: {additional_details[detail]}\n" - slack_app.client.chat_postMessage( - channel=alerts_channel, text=slack_msg - ) + slack_app.client.chat_postMessage(channel=alerts_channel, + text=slack_msg) elif callback == "helicone": print_verbose("reaches helicone for logging!") model = args[0] if len(args) > 0 else kwargs["model"] @@ -931,6 +982,22 @@ def handle_success(args, kwargs, result, start_time, end_time): end_time=end_time, print_verbose=print_verbose, ) + elif callback == "llmonitor": + print_verbose("reaches llmonitor for logging!") + model = args[0] if len(args) > 0 else kwargs["model"] + messages = args[1] if len(args) > 1 else kwargs["messages"] + usage = kwargs["usage"] + llmonitorLogger.log_event( + type="end", + model=model, + messages=messages, + user_id=litellm._thread_context.user, + response_obj=result, + time=end_time, + usage=usage, + run_id=kwargs["litellm_call_id"], + print_verbose=print_verbose, + ) elif callback == "aispend": print_verbose("reaches aispend for logging!") model = args[0] if len(args) > 0 else kwargs["model"] @@ -984,7 +1051,7 @@ def handle_success(args, kwargs, result, start_time, end_time): print_verbose=print_verbose, ) except Exception as e: - ## LOGGING + # LOGGING exception_logging(logger_fn=user_logger_fn, exception=e) print_verbose( f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}" @@ -995,7 +1062,7 @@ def handle_success(args, kwargs, result, start_time, end_time): success_handler(args, kwargs) pass except Exception as e: - ## LOGGING + # LOGGING exception_logging(logger_fn=user_logger_fn, exception=e) print_verbose( f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}" @@ -1046,33 +1113,36 @@ def exception_type(model, original_exception, custom_llm_provider): exception_type = "" if "claude" in model: # one of the anthropics if hasattr(original_exception, "status_code"): - print_verbose(f"status_code: {original_exception.status_code}") + print_verbose( + f"status_code: {original_exception.status_code}") if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( - message=f"AnthropicException - {original_exception.message}", + message= + f"AnthropicException - {original_exception.message}", llm_provider="anthropic", ) elif original_exception.status_code == 400: exception_mapping_worked = True raise InvalidRequestError( - message=f"AnthropicException - {original_exception.message}", + message= + f"AnthropicException - {original_exception.message}", model=model, llm_provider="anthropic", ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( - message=f"AnthropicException - {original_exception.message}", + message= + f"AnthropicException - {original_exception.message}", llm_provider="anthropic", ) - elif ( - "Could not resolve authentication method. Expected either api_key or auth_token to be set." - in error_str - ): + elif ("Could not resolve authentication method. Expected either api_key or auth_token to be set." + in error_str): exception_mapping_worked = True raise AuthenticationError( - message=f"AnthropicException - {original_exception.message}", + message= + f"AnthropicException - {original_exception.message}", llm_provider="anthropic", ) elif "replicate" in model: @@ -1096,35 +1166,36 @@ def exception_type(model, original_exception, custom_llm_provider): llm_provider="replicate", ) elif ( - exception_type == "ReplicateError" - ): ## ReplicateError implies an error on Replicate server side, not user side + exception_type == "ReplicateError" + ): # ReplicateError implies an error on Replicate server side, not user side raise ServiceUnavailableError( message=f"ReplicateException - {error_str}", llm_provider="replicate", ) elif model == "command-nightly": # Cohere - if ( - "invalid api token" in error_str - or "No API key provided." in error_str - ): + if ("invalid api token" in error_str + or "No API key provided." in error_str): exception_mapping_worked = True raise AuthenticationError( - message=f"CohereException - {original_exception.message}", + message= + f"CohereException - {original_exception.message}", llm_provider="cohere", ) elif "too many tokens" in error_str: exception_mapping_worked = True raise InvalidRequestError( - message=f"CohereException - {original_exception.message}", + message= + f"CohereException - {original_exception.message}", model=model, llm_provider="cohere", ) elif ( - "CohereConnectionError" in exception_type + "CohereConnectionError" in exception_type ): # cohere seems to fire these errors when we load test it (1k+ messages / min) exception_mapping_worked = True raise RateLimitError( - message=f"CohereException - {original_exception.message}", + message= + f"CohereException - {original_exception.message}", llm_provider="cohere", ) elif custom_llm_provider == "huggingface": @@ -1132,27 +1203,30 @@ def exception_type(model, original_exception, custom_llm_provider): if original_exception.status_code == 401: exception_mapping_worked = True raise AuthenticationError( - message=f"HuggingfaceException - {original_exception.message}", + message= + f"HuggingfaceException - {original_exception.message}", llm_provider="huggingface", ) elif original_exception.status_code == 400: exception_mapping_worked = True raise InvalidRequestError( - message=f"HuggingfaceException - {original_exception.message}", + message= + f"HuggingfaceException - {original_exception.message}", model=model, llm_provider="huggingface", ) elif original_exception.status_code == 429: exception_mapping_worked = True raise RateLimitError( - message=f"HuggingfaceException - {original_exception.message}", + message= + f"HuggingfaceException - {original_exception.message}", llm_provider="huggingface", ) raise original_exception # base case - return the original exception else: raise original_exception except Exception as e: - ## LOGGING + # LOGGING exception_logging( logger_fn=user_logger_fn, additional_args={ @@ -1173,7 +1247,7 @@ def safe_crash_reporting(model=None, exception=None, custom_llm_provider=None): "exception": str(exception), "custom_llm_provider": custom_llm_provider, } - threading.Thread(target=litellm_telemetry, args=(data,)).start() + threading.Thread(target=litellm_telemetry, args=(data, )).start() def litellm_telemetry(data): @@ -1223,11 +1297,13 @@ def get_secret(secret_name): if litellm.secret_manager_client != None: # TODO: check which secret manager is being used # currently only supports Infisical - secret = litellm.secret_manager_client.get_secret(secret_name).secret_value + secret = litellm.secret_manager_client.get_secret( + secret_name).secret_value if secret != None: return secret # if secret found in secret manager return it else: - raise ValueError(f"Secret '{secret_name}' not found in secret manager") + raise ValueError( + f"Secret '{secret_name}' not found in secret manager") elif litellm.api_key != None: # if users use litellm default key return litellm.api_key else: @@ -1238,6 +1314,7 @@ def get_secret(secret_name): # wraps the completion stream to return the correct format for the model # replicate/anthropic/cohere class CustomStreamWrapper: + def __init__(self, completion_stream, model, custom_llm_provider=None): self.model = model self.custom_llm_provider = custom_llm_provider @@ -1288,7 +1365,8 @@ class CustomStreamWrapper: elif self.model == "replicate": chunk = next(self.completion_stream) completion_obj["content"] = chunk - elif (self.model == "together_ai") or ("togethercomputer" in self.model): + elif (self.model == "together_ai") or ("togethercomputer" + in self.model): chunk = next(self.completion_stream) text_data = self.handle_together_ai_chunk(chunk) if text_data == "": @@ -1321,12 +1399,11 @@ def read_config_args(config_path): ########## ollama implementation ############################ -import aiohttp -async def get_ollama_response_stream( - api_base="http://localhost:11434", model="llama2", prompt="Why is the sky blue?" -): +async def get_ollama_response_stream(api_base="http://localhost:11434", + model="llama2", + prompt="Why is the sky blue?"): session = aiohttp.ClientSession() url = f"{api_base}/api/generate" data = { @@ -1349,7 +1426,11 @@ async def get_ollama_response_stream( "content": "", } completion_obj["content"] = j["response"] - yield {"choices": [{"delta": completion_obj}]} + yield { + "choices": [{ + "delta": completion_obj + }] + } # self.responses.append(j["response"]) # yield "blank" except Exception as e: diff --git a/proxy-server/readme.md b/proxy-server/readme.md index 4f735f38c..edd03de3f 100644 --- a/proxy-server/readme.md +++ b/proxy-server/readme.md @@ -1,6 +1,7 @@ - # liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching + ### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models + [![PyPI Version](https://img.shields.io/pypi/v/litellm.svg)](https://pypi.org/project/litellm/) [![PyPI Version](https://img.shields.io/badge/stable%20version-v0.1.345-blue?color=green&link=https://pypi.org/project/litellm/0.1.1/)](https://pypi.org/project/litellm/0.1.1/) ![Downloads](https://img.shields.io/pypi/dm/litellm) @@ -11,34 +12,36 @@ ![4BC6491E-86D0-4833-B061-9F54524B2579](https://github.com/BerriAI/litellm/assets/17561003/f5dd237b-db5e-42e1-b1ac-f05683b1d724) ## What does liteLLM proxy do + - Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face** - + Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k` + ```json { "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1", "messages": [ - { - "content": "Hello, whats the weather in San Francisco??", - "role": "user" - } - ] + { + "content": "Hello, whats the weather in San Francisco??", + "role": "user" + } + ] } ``` -- **Consistent Input/Output** Format - - Call all models using the OpenAI format - `completion(model, messages)` - - Text responses will always be available at `['choices'][0]['message']['content']` -- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`) -- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ - **Example: Logs sent to Supabase** +- **Consistent Input/Output** Format + - Call all models using the OpenAI format - `completion(model, messages)` + - Text responses will always be available at `['choices'][0]['message']['content']` +- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`) +- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone`, `LLMonitor` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/ + + **Example: Logs sent to Supabase** Screenshot 2023-08-11 at 4 02 46 PM - **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model - **Caching** - Implementation of Semantic Caching - **Streaming & Async Support** - Return generators to stream text responses - ## API Endpoints ### `/chat/completions` (POST) @@ -46,34 +49,37 @@ This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc #### Input + This API endpoint accepts all inputs in raw JSON and expects the following inputs -- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): - eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k` + +- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): + eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k` - `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role). - Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/ - #### Example JSON body + For claude-2 + ```json { - "model": "claude-2", - "messages": [ - { - "content": "Hello, whats the weather in San Francisco??", - "role": "user" - } - ] - + "model": "claude-2", + "messages": [ + { + "content": "Hello, whats the weather in San Francisco??", + "role": "user" + } + ] } ``` ### Making an API request to the Proxy Server + ```python import requests import json -# TODO: use your URL +# TODO: use your URL url = "http://localhost:5000/chat/completions" payload = json.dumps({ @@ -94,34 +100,38 @@ print(response.text) ``` ### Output [Response Format] -Responses from the server are given in the following format. + +Responses from the server are given in the following format. All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/ + ```json { - "choices": [ - { - "finish_reason": "stop", - "index": 0, - "message": { - "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.", - "role": "assistant" - } - } - ], - "created": 1691790381, - "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb", - "model": "gpt-3.5-turbo-0613", - "object": "chat.completion", - "usage": { - "completion_tokens": 41, - "prompt_tokens": 16, - "total_tokens": 57 + "choices": [ + { + "finish_reason": "stop", + "index": 0, + "message": { + "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.", + "role": "assistant" + } } + ], + "created": 1691790381, + "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb", + "model": "gpt-3.5-turbo-0613", + "object": "chat.completion", + "usage": { + "completion_tokens": 41, + "prompt_tokens": 16, + "total_tokens": 57 + } } ``` ## Installation & Usage + ### Running Locally + 1. Clone liteLLM repository to your local machine: ``` git clone https://github.com/BerriAI/liteLLM-proxy @@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM python main.py ``` - - ## Deploying + 1. Quick Start: Deploy on Railway [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/DYqQAW?referralCode=t3ukrU) - -2. `GCP`, `AWS`, `Azure` -This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers + +2. `GCP`, `AWS`, `Azure` + This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers # Support / Talk with founders + - [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version) - [Community Discord 💭](https://discord.gg/wuPM9dRgDw) - Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238 - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai - ## Roadmap + - [ ] Support hosted db (e.g. Supabase) - [ ] Easily send data to places like posthog and sentry. - [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings