diff --git a/cookbook/proxy-server/readme.md b/cookbook/proxy-server/readme.md
index 4f735f38c..bb9e00804 100644
--- a/cookbook/proxy-server/readme.md
+++ b/cookbook/proxy-server/readme.md
@@ -1,6 +1,7 @@
-
 # liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
+
 ### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
+
 [![PyPI Version](https://img.shields.io/pypi/v/litellm.svg)](https://pypi.org/project/litellm/)
 [![PyPI Version](https://img.shields.io/badge/stable%20version-v0.1.345-blue?color=green&link=https://pypi.org/project/litellm/0.1.1/)](https://pypi.org/project/litellm/0.1.1/)
 ![Downloads](https://img.shields.io/pypi/dm/litellm)
@@ -11,34 +12,36 @@
 ![4BC6491E-86D0-4833-B061-9F54524B2579](https://github.com/BerriAI/litellm/assets/17561003/f5dd237b-db5e-42e1-b1ac-f05683b1d724)
 
 ## What does liteLLM proxy do
+
 - Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
-  
+
   Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
   ```json
   {
     "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
     "messages": [
-                    { 
-                        "content": "Hello, whats the weather in San Francisco??",
-                        "role": "user"
-                    }
-                ]
+      {
+        "content": "Hello, whats the weather in San Francisco??",
+        "role": "user"
+      }
+    ]
   }
   ```
-- **Consistent Input/Output** Format
-    - Call all models using the OpenAI format - `completion(model, messages)`
-    - Text responses will always be available at `['choices'][0]['message']['content']`
-- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
-- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
 
- **Example: Logs sent to Supabase**
+- **Consistent Input/Output** Format
+  - Call all models using the OpenAI format - `completion(model, messages)`
+  - Text responses will always be available at `['choices'][0]['message']['content']`
+- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
+- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `LLMonitor,` `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
+
+  **Example: Logs sent to Supabase**
   <img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
 
 - **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
 - **Caching** - Implementation of Semantic Caching
 - **Streaming & Async Support** - Return generators to stream text responses
 
-
 ## API Endpoints
 
 ### `/chat/completions` (POST)
@@ -46,34 +49,37 @@
 This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
 
 #### Input
+
 This API endpoint accepts all inputs in raw JSON and expects the following inputs
-- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): 
- eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
+- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
+  eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
 - `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
 - Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
 
-
 #### Example JSON body
+
 For claude-2
+
 ```json
 {
-    "model": "claude-2",
-    "messages": [
-                    { 
-                        "content": "Hello, whats the weather in San Francisco??",
-                        "role": "user"
-                    }
-                ]
-    
+  "model": "claude-2",
+  "messages": [
+    {
+      "content": "Hello, whats the weather in San Francisco??",
+      "role": "user"
+    }
+  ]
 }
 ```
 
 ### Making an API request to the Proxy Server
+
 ```python
 import requests
 import json
 
-# TODO: use your URL 
+# TODO: use your URL
 url = "http://localhost:5000/chat/completions"
 
 payload = json.dumps({
@@ -94,34 +100,38 @@ print(response.text)
 ```
 
 ### Output [Response Format]
-Responses from the server are given in the following format. 
+
+Responses from the server are given in the following format.
 All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
+
 ```json
 {
-    "choices": [
-        {
-            "finish_reason": "stop",
-            "index": 0,
-            "message": {
-                "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
-                "role": "assistant"
-            }
-        }
-    ],
-    "created": 1691790381,
-    "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
-    "model": "gpt-3.5-turbo-0613",
-    "object": "chat.completion",
-    "usage": {
-        "completion_tokens": 41,
-        "prompt_tokens": 16,
-        "total_tokens": 57
+  "choices": [
+    {
+      "finish_reason": "stop",
+      "index": 0,
+      "message": {
+        "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
+        "role": "assistant"
+      }
     }
+  ],
+  "created": 1691790381,
+  "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
+  "model": "gpt-3.5-turbo-0613",
+  "object": "chat.completion",
+  "usage": {
+    "completion_tokens": 41,
+    "prompt_tokens": 16,
+    "total_tokens": 57
+  }
 }
 ```
 
 ## Installation & Usage
+
 ### Running Locally
+
 1. Clone liteLLM repository to your local machine:
    ```
    git clone https://github.com/BerriAI/liteLLM-proxy
@@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM
    python main.py
    ```
 
-   
-
 ## Deploying
+
 1. Quick Start: Deploy on Railway
 
    [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
-   
-2. `GCP`, `AWS`, `Azure` 
-This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
+
+2. `GCP`, `AWS`, `Azure`
+   This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
 
 # Support / Talk with founders
+
 - [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
 - [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
 - Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
 - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
 
-
 ## Roadmap
+
 - [ ] Support hosted db (e.g. Supabase)
 - [ ] Easily send data to places like posthog and sentry.
 - [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings
diff --git a/litellm/integrations/llmonitor.py b/litellm/integrations/llmonitor.py
index e7430b5bb..b2940e872 100644
--- a/litellm/integrations/llmonitor.py
+++ b/litellm/integrations/llmonitor.py
@@ -5,6 +5,7 @@ import traceback
 import dotenv
 import os
 import requests
+
 dotenv.load_dotenv()  # Loading env variables using dotenv
 
 
@@ -14,45 +15,34 @@ class LLMonitorLogger:
         # Instance variables
         self.api_url = os.getenv(
             "LLMONITOR_API_URL") or "https://app.llmonitor.com"
-        self.account_id = os.getenv("LLMONITOR_APP_ID")
+        self.app_id = os.getenv("LLMONITOR_APP_ID")
 
-    def log_event(self, model, messages, response_obj, start_time, end_time, print_verbose):
+    def log_event(self, type, run_id, error, usage, model, messages,
+                  response_obj, user_id, time, print_verbose):
         # Method definition
         try:
             print_verbose(
-                f"LLMonitor Logging - Enters logging function for model {model}")
+                f"LLMonitor Logging - Enters logging function for model {model}"
+            )
 
-            print(model, messages, response_obj, start_time, end_time)
+            print(type, model, messages, response_obj, time, end_user)
 
-            # headers = {
-            #     'Content-Type': 'application/json'
-            # }
+            headers = {'Content-Type': 'application/json'}
 
-            # prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = self.price_calculator(
-            #     model, response_obj, start_time, end_time)
-            # total_cost = prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
+            data = {
+                "type": "llm",
+                "name": model,
+                "runId": run_id,
+                "app": self.app_id,
+                "error": error,
+                "event": type,
+                "timestamp": time.isoformat(),
+                "userId": user_id,
+                "input": messages,
+                "output": response_obj['choices'][0]['message']['content'],
+            }
 
-            # response_time = (end_time-start_time).total_seconds()
-            # if "response" in response_obj:
-            #     data = [{
-            #         "response_time": response_time,
-            #         "model_id": response_obj["model"],
-            #         "total_cost": total_cost,
-            #         "messages": messages,
-            #         "response": response_obj['choices'][0]['message']['content'],
-            #         "account_id": self.account_id
-            #     }]
-            # elif "error" in response_obj:
-            #     data = [{
-            #         "response_time": response_time,
-            #         "model_id": response_obj["model"],
-            #         "total_cost": total_cost,
-            #         "messages": messages,
-            #         "error": response_obj['error'],
-            #         "account_id": self.account_id
-            #     }]
-
-            # print_verbose(f"BerriSpend Logging - final data object: {data}")
+            print_verbose(f"LLMonitor Logging - final data object: {data}")
             # response = requests.post(url, headers=headers, json=data)
         except:
             # traceback.print_exc()
diff --git a/litellm/tests/test_llmonitor_integration.py b/litellm/tests/test_llmonitor_integration.py
index 5a4b4beb3..d2045e5dc 100644
--- a/litellm/tests/test_llmonitor_integration.py
+++ b/litellm/tests/test_llmonitor_integration.py
@@ -1,28 +1,36 @@
 #### What this tests ####
-#    This tests if logging to the helicone integration actually works
-
-from litellm import embedding, completion
-import litellm
+#    This tests if logging to the llmonitor integration actually works
+# Adds the parent directory to the system path
 import sys
 import os
-import traceback
-import pytest
 
-# Adds the parent directory to the system path
 sys.path.insert(0, os.path.abspath('../..'))
 
+from litellm import completion
+import litellm
+
+litellm.input_callback = ["llmonitor"]
 litellm.success_callback = ["llmonitor"]
+litellm.error_callback = ["llmonitor"]
 
 litellm.set_verbose = True
 
-user_message = "Hello, how are you?"
-messages = [{"content": user_message, "role": "user"}]
-
-
 # openai call
-response = completion(model="gpt-3.5-turbo",
-                      messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
+# response = completion(model="gpt-3.5-turbo",
+#                       messages=[{
+#                           "role": "user",
+#                           "content": "Hi 👋 - i'm openai"
+#                       }])
+
+# print(response)
+
+# #bad request call
+# response = completion(model="chatgpt-test", messages=[{"role": "user", "content": "Hi 👋 - i'm a bad request"}])
 
 # cohere call
-# response = completion(model="command-nightly",
-# messages=[{"role": "user", "content": "Hi 👋 - i'm cohere"}])
+response = completion(model="command-nightly",
+                      messages=[{
+                          "role": "user",
+                          "content": "Hi 👋 - i'm cohere"
+                      }])
+print(response)
diff --git a/litellm/utils.py b/litellm/utils.py
index e47b55978..c451095b8 100644
--- a/litellm/utils.py
+++ b/litellm/utils.py
@@ -1,20 +1,7 @@
-import sys
-import dotenv, json, traceback, threading
-import subprocess, os
-import litellm, openai
-import random, uuid, requests
-import datetime, time
-import tiktoken
-
-encoding = tiktoken.get_encoding("cl100k_base")
-import pkg_resources
-from .integrations.helicone import HeliconeLogger
-from .integrations.aispend import AISpendLogger
-from .integrations.berrispend import BerriSpendLogger
-from .integrations.supabase import Supabase
-from .integrations.litedebugger import LiteDebugger
-from openai.error import OpenAIError as OriginalError
-from openai.openai_object import OpenAIObject
+import aiohttp
+import subprocess
+import importlib
+from typing import List, Dict, Union, Optional
 from .exceptions import (
     AuthenticationError,
     InvalidRequestError,
@@ -22,7 +9,32 @@ from .exceptions import (
     ServiceUnavailableError,
     OpenAIError,
 )
-from typing import List, Dict, Union, Optional
+from openai.openai_object import OpenAIObject
+from openai.error import OpenAIError as OriginalError
+from .integrations.llmonitor import LLMonitorLogger
+from .integrations.litedebugger import LiteDebugger
+from .integrations.supabase import Supabase
+from .integrations.berrispend import BerriSpendLogger
+from .integrations.aispend import AISpendLogger
+from .integrations.helicone import HeliconeLogger
+import pkg_resources
+import sys
+import dotenv
+import json
+import traceback
+import threading
+import subprocess
+import os
+import litellm
+import openai
+import random
+import uuid
+import requests
+import datetime
+import time
+import tiktoken
+
+encoding = tiktoken.get_encoding("cl100k_base")
 
 ####### ENVIRONMENT VARIABLES ###################
 dotenv.load_dotenv()  # Loading env variables using dotenv
@@ -37,6 +49,7 @@ aispendLogger = None
 berrispendLogger = None
 supabaseClient = None
 liteDebuggerClient = None
+llmonitorLogger = None
 callback_list: Optional[List[str]] = []
 user_logger_fn = None
 additional_details: Optional[Dict[str, str]] = {}
@@ -63,6 +76,7 @@ local_cache: Optional[Dict[str, str]] = {}
 
 
 class Message(OpenAIObject):
+
     def __init__(self, content="default", role="assistant", **params):
         super(Message, self).__init__(**params)
         self.content = content
@@ -70,7 +84,12 @@ class Message(OpenAIObject):
 
 
 class Choices(OpenAIObject):
-    def __init__(self, finish_reason="stop", index=0, message=Message(), **params):
+
+    def __init__(self,
+                 finish_reason="stop",
+                 index=0,
+                 message=Message(),
+                 **params):
         super(Choices, self).__init__(**params)
         self.finish_reason = finish_reason
         self.index = index
@@ -78,20 +97,22 @@ class Choices(OpenAIObject):
 
 
 class ModelResponse(OpenAIObject):
-    def __init__(self, choices=None, created=None, model=None, usage=None, **params):
+
+    def __init__(self,
+                 choices=None,
+                 created=None,
+                 model=None,
+                 usage=None,
+                 **params):
         super(ModelResponse, self).__init__(**params)
         self.choices = choices if choices else [Choices()]
         self.created = created
         self.model = model
-        self.usage = (
-            usage
-            if usage
-            else {
-                "prompt_tokens": None,
-                "completion_tokens": None,
-                "total_tokens": None,
-            }
-        )
+        self.usage = (usage if usage else {
+            "prompt_tokens": None,
+            "completion_tokens": None,
+            "total_tokens": None,
+        })
 
     def to_dict_recursive(self):
         d = super().to_dict_recursive()
@@ -108,8 +129,6 @@ def print_verbose(print_statement):
 
 
 ####### Package Import Handler ###################
-import importlib
-import subprocess
 
 
 def install_and_import(package: str):
@@ -139,6 +158,7 @@ def install_and_import(package: str):
 # Logging function -> log the exact model details + what's being sent | Non-Blocking
 class Logging:
     global supabaseClient, liteDebuggerClient
+
     def __init__(self, model, messages, optional_params, litellm_params):
         self.model = model
         self.messages = messages
@@ -146,20 +166,20 @@ class Logging:
         self.litellm_params = litellm_params
         self.logger_fn = litellm_params["logger_fn"]
         self.model_call_details = {
-            "model": model, 
-            "messages": messages, 
+            "model": model,
+            "messages": messages,
             "optional_params": self.optional_params,
             "litellm_params": self.litellm_params,
         }
-        
+
     def pre_call(self, input, api_key, additional_args={}):
         try:
             print(f"logging pre call for model: {self.model}")
             self.model_call_details["input"] = input
             self.model_call_details["api_key"] = api_key
             self.model_call_details["additional_args"] = additional_args
-            
-            ## User Logging -> if you pass in a custom logging function
+
+            # User Logging -> if you pass in a custom logging function
             print_verbose(
                 f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
             )
@@ -173,7 +193,7 @@ class Logging:
                         f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
                     )
 
-            ## Input Integration Logging -> If you want to log the fact that an attempt to call the model was made 
+            # Input Integration Logging -> If you want to log the fact that an attempt to call the model was made
             for callback in litellm.input_callback:
                 try:
                     if callback == "supabase":
@@ -185,7 +205,21 @@ class Logging:
                             model=model,
                             messages=messages,
                             end_user=litellm._thread_context.user,
-                            litellm_call_id=self.litellm_params["litellm_call_id"],
+                            litellm_call_id=self.
+                            litellm_params["litellm_call_id"],
+                            print_verbose=print_verbose,
+                        )
+                    elif callback == "llmonitor":
+                        print_verbose("reaches llmonitor for logging!")
+                        model = self.model
+                        messages = self.messages
+                        print(f"liteDebuggerClient: {liteDebuggerClient}")
+                        llmonitorLogger.log_event(
+                            type="start",
+                            model=model,
+                            messages=messages,
+                            user_id=litellm._thread_context.user,
+                            run_id=self.litellm_params["litellm_call_id"],
                             print_verbose=print_verbose,
                         )
                     elif callback == "lite_debugger":
@@ -197,15 +231,18 @@ class Logging:
                             model=model,
                             messages=messages,
                             end_user=litellm._thread_context.user,
-                            litellm_call_id=self.litellm_params["litellm_call_id"],
+                            litellm_call_id=self.
+                            litellm_params["litellm_call_id"],
                             print_verbose=print_verbose,
                         )
                 except Exception as e:
-                    print_verbose(f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}")
+                    print_verbose(
+                        f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while input logging with integrations {traceback.format_exc()}"
+                    )
                     print_verbose(
                         f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
                     )
-                    if capture_exception: # log this error to sentry for debugging 
+                    if capture_exception:  # log this error to sentry for debugging
                         capture_exception(e)
         except:
             print_verbose(
@@ -214,9 +251,9 @@ class Logging:
             print_verbose(
                 f"LiteLLM.Logging: is sentry capture exception initialized {capture_exception}"
             )
-            if capture_exception: # log this error to sentry for debugging 
+            if capture_exception:  # log this error to sentry for debugging
                 capture_exception(e)
-        
+
     def post_call(self, input, api_key, original_response, additional_args={}):
         # Do something here
         try:
@@ -224,8 +261,8 @@ class Logging:
             self.model_call_details["api_key"] = api_key
             self.model_call_details["original_response"] = original_response
             self.model_call_details["additional_args"] = additional_args
-            
-            ## User Logging -> if you pass in a custom logging function
+
+            # User Logging -> if you pass in a custom logging function
             print_verbose(
                 f"Logging Details: logger_fn - {self.logger_fn} | callable(logger_fn) - {callable(self.logger_fn)}"
             )
@@ -243,9 +280,9 @@ class Logging:
                 f"LiteLLM.LoggingError: [Non-Blocking] Exception occurred while logging {traceback.format_exc()}"
             )
             pass
-    
+
     # Add more methods as needed
-    
+
 
 def exception_logging(
     additional_args={},
@@ -257,7 +294,7 @@ def exception_logging(
         if exception:
             model_call_details["exception"] = exception
         model_call_details["additional_args"] = additional_args
-        ## User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs
+        # User Logging -> if you pass in a custom logging function or want to use sentry breadcrumbs
         print_verbose(
             f"Logging Details: logger_fn - {logger_fn} | callable(logger_fn) - {callable(logger_fn)}"
         )
@@ -280,20 +317,20 @@ def exception_logging(
 ####### CLIENT ###################
 # make it easy to log if completion/embedding runs succeeded or failed + see what happened | Non-Blocking
 def client(original_function):
+
     def function_setup(
         *args, **kwargs
     ):  # just run once to check if user wants to send their data anywhere - PostHog/Sentry/Slack/etc.
         try:
             global callback_list, add_breadcrumb, user_logger_fn
-            if (
-                len(litellm.input_callback) > 0 or len(litellm.success_callback) > 0 or len(litellm.failure_callback) > 0
-            ) and len(callback_list) == 0:
+            if (len(litellm.input_callback) > 0
+                    or len(litellm.success_callback) > 0
+                    or len(litellm.failure_callback)
+                    > 0) and len(callback_list) == 0:
                 callback_list = list(
-                    set(litellm.input_callback + litellm.success_callback + litellm.failure_callback)
-                )
-                set_callbacks(
-                    callback_list=callback_list,
-                )
+                    set(litellm.input_callback + litellm.success_callback +
+                        litellm.failure_callback))
+                set_callbacks(callback_list=callback_list, )
             if add_breadcrumb:
                 add_breadcrumb(
                     category="litellm.llm_call",
@@ -310,12 +347,11 @@ def client(original_function):
         if litellm.telemetry:
             try:
                 model = args[0] if len(args) > 0 else kwargs["model"]
-                exception = kwargs["exception"] if "exception" in kwargs else None
-                custom_llm_provider = (
-                    kwargs["custom_llm_provider"]
-                    if "custom_llm_provider" in kwargs
-                    else None
-                )
+                exception = kwargs[
+                    "exception"] if "exception" in kwargs else None
+                custom_llm_provider = (kwargs["custom_llm_provider"]
+                                       if "custom_llm_provider" in kwargs else
+                                       None)
                 safe_crash_reporting(
                     model=model,
                     exception=exception,
@@ -340,15 +376,12 @@ def client(original_function):
     def check_cache(*args, **kwargs):
         try:  # never block execution
             prompt = get_prompt(*args, **kwargs)
-            if (
-                prompt != None and prompt in local_cache
-            ):  # check if messages / prompt exists
+            if (prompt != None and prompt
+                    in local_cache):  # check if messages / prompt exists
                 if litellm.caching_with_models:
                     # if caching with model names is enabled, key is prompt + model name
-                    if (
-                        "model" in kwargs
-                        and kwargs["model"] in local_cache[prompt]["models"]
-                    ):
+                    if ("model" in kwargs and kwargs["model"]
+                            in local_cache[prompt]["models"]):
                         cache_key = prompt + kwargs["model"]
                         return local_cache[cache_key]
                 else:  # caching only with prompts
@@ -363,10 +396,8 @@ def client(original_function):
         try:  # never block execution
             prompt = get_prompt(*args, **kwargs)
             if litellm.caching_with_models:  # caching with model + prompt
-                if (
-                    "model" in kwargs
-                    and kwargs["model"] in local_cache[prompt]["models"]
-                ):
+                if ("model" in kwargs
+                        and kwargs["model"] in local_cache[prompt]["models"]):
                     cache_key = prompt + kwargs["model"]
                     local_cache[cache_key] = result
             else:  # caching based only on prompts
@@ -381,24 +412,24 @@ def client(original_function):
             function_setup(*args, **kwargs)
             litellm_call_id = str(uuid.uuid4())
             kwargs["litellm_call_id"] = litellm_call_id
-            ## [OPTIONAL] CHECK CACHE
+            # [OPTIONAL] CHECK CACHE
             start_time = datetime.datetime.now()
             if (litellm.caching or litellm.caching_with_models) and (
-                cached_result := check_cache(*args, **kwargs)
-            ) is not None:
+                    cached_result := check_cache(*args, **kwargs)) is not None:
                 result = cached_result
             else:
-                ## MODEL CALL
+                # MODEL CALL
                 result = original_function(*args, **kwargs)
             end_time = datetime.datetime.now()
-            ## Add response to CACHE
+            # Add response to CACHE
             if litellm.caching:
                 add_cache(result, *args, **kwargs)
-            ## LOG SUCCESS
+            # LOG SUCCESS
             crash_reporting(*args, **kwargs)
             my_thread = threading.Thread(
-                target=handle_success, args=(args, kwargs, result, start_time, end_time)
-            )  # don't interrupt execution of main thread
+                target=handle_success,
+                args=(args, kwargs, result, start_time,
+                      end_time))  # don't interrupt execution of main thread
             my_thread.start()
             return result
         except Exception as e:
@@ -407,7 +438,8 @@ def client(original_function):
             end_time = datetime.datetime.now()
             my_thread = threading.Thread(
                 target=handle_failure,
-                args=(e, traceback_exception, start_time, end_time, args, kwargs),
+                args=(e, traceback_exception, start_time, end_time, args,
+                      kwargs),
             )  # don't interrupt execution of main thread
             my_thread.start()
             raise e
@@ -432,18 +464,18 @@ def token_counter(model, text):
     return num_tokens
 
 
-def cost_per_token(model="gpt-3.5-turbo", prompt_tokens=0, completion_tokens=0):
-    ## given
+def cost_per_token(model="gpt-3.5-turbo",
+                   prompt_tokens=0,
+                   completion_tokens=0):
+    # given
     prompt_tokens_cost_usd_dollar = 0
     completion_tokens_cost_usd_dollar = 0
     model_cost_ref = litellm.model_cost
     if model in model_cost_ref:
         prompt_tokens_cost_usd_dollar = (
-            model_cost_ref[model]["input_cost_per_token"] * prompt_tokens
-        )
+            model_cost_ref[model]["input_cost_per_token"] * prompt_tokens)
         completion_tokens_cost_usd_dollar = (
-            model_cost_ref[model]["output_cost_per_token"] * completion_tokens
-        )
+            model_cost_ref[model]["output_cost_per_token"] * completion_tokens)
         return prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar
     else:
         # calculate average input cost
@@ -464,8 +496,9 @@ def completion_cost(model="gpt-3.5-turbo", prompt="", completion=""):
     prompt_tokens = token_counter(model=model, text=prompt)
     completion_tokens = token_counter(model=model, text=completion)
     prompt_tokens_cost_usd_dollar, completion_tokens_cost_usd_dollar = cost_per_token(
-        model=model, prompt_tokens=prompt_tokens, completion_tokens=completion_tokens
-    )
+        model=model,
+        prompt_tokens=prompt_tokens,
+        completion_tokens=completion_tokens)
     return prompt_tokens_cost_usd_dollar + completion_tokens_cost_usd_dollar
 
 
@@ -557,9 +590,8 @@ def get_optional_params(
             optional_params["max_tokens"] = max_tokens
         if frequency_penalty != 0:
             optional_params["frequency_penalty"] = frequency_penalty
-    elif (
-        model == "chat-bison"
-    ):  # chat-bison has diff args from chat-bison@001 ty Google
+    elif (model == "chat-bison"
+          ):  # chat-bison has diff args from chat-bison@001 ty Google
         if temperature != 1:
             optional_params["temperature"] = temperature
         if top_p != 1:
@@ -619,7 +651,10 @@ def load_test_model(
         test_prompt = prompt
     if num_calls:
         test_calls = num_calls
-    messages = [[{"role": "user", "content": test_prompt}] for _ in range(test_calls)]
+    messages = [[{
+        "role": "user",
+        "content": test_prompt
+    }] for _ in range(test_calls)]
     start_time = time.time()
     try:
         litellm.batch_completion(
@@ -649,7 +684,7 @@ def load_test_model(
 
 
 def set_callbacks(callback_list):
-    global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
+    global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, heliconeLogger, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger
     try:
         for callback in callback_list:
             print(f"callback: {callback}")
@@ -657,17 +692,15 @@ def set_callbacks(callback_list):
                 try:
                     import sentry_sdk
                 except ImportError:
-                    print_verbose("Package 'sentry_sdk' is missing. Installing it...")
+                    print_verbose(
+                        "Package 'sentry_sdk' is missing. Installing it...")
                     subprocess.check_call(
-                        [sys.executable, "-m", "pip", "install", "sentry_sdk"]
-                    )
+                        [sys.executable, "-m", "pip", "install", "sentry_sdk"])
                     import sentry_sdk
                 sentry_sdk_instance = sentry_sdk
-                sentry_trace_rate = (
-                    os.environ.get("SENTRY_API_TRACE_RATE")
-                    if "SENTRY_API_TRACE_RATE" in os.environ
-                    else "1.0"
-                )
+                sentry_trace_rate = (os.environ.get("SENTRY_API_TRACE_RATE")
+                                     if "SENTRY_API_TRACE_RATE" in os.environ
+                                     else "1.0")
                 sentry_sdk_instance.init(
                     dsn=os.environ.get("SENTRY_API_URL"),
                     traces_sample_rate=float(sentry_trace_rate),
@@ -678,10 +711,10 @@ def set_callbacks(callback_list):
                 try:
                     from posthog import Posthog
                 except ImportError:
-                    print_verbose("Package 'posthog' is missing. Installing it...")
+                    print_verbose(
+                        "Package 'posthog' is missing. Installing it...")
                     subprocess.check_call(
-                        [sys.executable, "-m", "pip", "install", "posthog"]
-                    )
+                        [sys.executable, "-m", "pip", "install", "posthog"])
                     from posthog import Posthog
                 posthog = Posthog(
                     project_api_key=os.environ.get("POSTHOG_API_KEY"),
@@ -691,10 +724,10 @@ def set_callbacks(callback_list):
                 try:
                     from slack_bolt import App
                 except ImportError:
-                    print_verbose("Package 'slack_bolt' is missing. Installing it...")
+                    print_verbose(
+                        "Package 'slack_bolt' is missing. Installing it...")
                     subprocess.check_call(
-                        [sys.executable, "-m", "pip", "install", "slack_bolt"]
-                    )
+                        [sys.executable, "-m", "pip", "install", "slack_bolt"])
                     from slack_bolt import App
                 slack_app = App(
                     token=os.environ.get("SLACK_API_TOKEN"),
@@ -704,6 +737,8 @@ def set_callbacks(callback_list):
                 print_verbose(f"Initialized Slack App: {slack_app}")
             elif callback == "helicone":
                 heliconeLogger = HeliconeLogger()
+            elif callback == "llmonitor":
+                llmonitorLogger = LLMonitorLogger()
             elif callback == "aispend":
                 aispendLogger = AISpendLogger()
             elif callback == "berrispend":
@@ -718,7 +753,8 @@ def set_callbacks(callback_list):
         raise e
 
 
-def handle_failure(exception, traceback_exception, start_time, end_time, args, kwargs):
+def handle_failure(exception, traceback_exception, start_time, end_time, args,
+                   kwargs):
     global sentry_sdk_instance, capture_exception, add_breadcrumb, posthog, slack_app, alerts_channel, aispendLogger, berrispendLogger, supabaseClient, liteDebuggerClient
     try:
         # print_verbose(f"handle_failure args: {args}")
@@ -728,8 +764,7 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
         failure_handler = additional_details.pop("failure_handler", None)
 
         additional_details["Event_Name"] = additional_details.pop(
-            "failed_event_name", "litellm.failed_query"
-        )
+            "failed_event_name", "litellm.failed_query")
         print_verbose(f"self.failure_callback: {litellm.failure_callback}")
 
         # print_verbose(f"additional_details: {additional_details}")
@@ -746,9 +781,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                     for detail in additional_details:
                         slack_msg += f"{detail}: {additional_details[detail]}\n"
                     slack_msg += f"Traceback: {traceback_exception}"
-                    slack_app.client.chat_postMessage(
-                        channel=alerts_channel, text=slack_msg
-                    )
+                    slack_app.client.chat_postMessage(channel=alerts_channel,
+                                                      text=slack_msg)
                 elif callback == "sentry":
                     capture_exception(exception)
                 elif callback == "posthog":
@@ -767,9 +801,8 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                     print_verbose(f"ph_obj: {ph_obj}")
                     print_verbose(f"PostHog Event Name: {event_name}")
                     if "user_id" in additional_details:
-                        posthog.capture(
-                            additional_details["user_id"], event_name, ph_obj
-                        )
+                        posthog.capture(additional_details["user_id"],
+                                        event_name, ph_obj)
                     else:  # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
                         unique_id = str(uuid.uuid4())
                         posthog.capture(unique_id, event_name)
@@ -783,10 +816,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                         "created": time.time(),
                         "error": traceback_exception,
                         "usage": {
-                            "prompt_tokens": prompt_token_calculator(
-                                model, messages=messages
-                            ),
-                            "completion_tokens": 0,
+                            "prompt_tokens":
+                            prompt_token_calculator(model, messages=messages),
+                            "completion_tokens":
+                            0,
                         },
                     }
                     berrispendLogger.log_event(
@@ -805,10 +838,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                         "model": model,
                         "created": time.time(),
                         "usage": {
-                            "prompt_tokens": prompt_token_calculator(
-                                model, messages=messages
-                            ),
-                            "completion_tokens": 0,
+                            "prompt_tokens":
+                            prompt_token_calculator(model, messages=messages),
+                            "completion_tokens":
+                            0,
                         },
                     }
                     aispendLogger.log_event(
@@ -818,6 +851,27 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                         end_time=end_time,
                         print_verbose=print_verbose,
                     )
+                elif callback == "llmonitor":
+                    print_verbose("reaches llmonitor for logging!")
+                    model = args[0] if len(args) > 0 else kwargs["model"]
+                    messages = args[1] if len(args) > 1 else kwargs["messages"]
+                    usage = {
+                        "prompt_tokens":
+                        prompt_token_calculator(model, messages=messages),
+                        "completion_tokens":
+                        0,
+                    }
+                    llmonitorLogger.log_event(
+                        type="error",
+                        user_id=litellm._thread_context.user,
+                        model=model,
+                        error=traceback_exception,
+                        response_obj=result,
+                        run_id=kwargs["litellm_call_id"],
+                        timestamp=end_time,
+                        usage=usage,
+                        print_verbose=print_verbose,
+                    )
                 elif callback == "supabase":
                     print_verbose("reaches supabase for logging!")
                     print_verbose(f"supabaseClient: {supabaseClient}")
@@ -828,10 +882,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                         "created": time.time(),
                         "error": traceback_exception,
                         "usage": {
-                            "prompt_tokens": prompt_token_calculator(
-                                model, messages=messages
-                            ),
-                            "completion_tokens": 0,
+                            "prompt_tokens":
+                            prompt_token_calculator(model, messages=messages),
+                            "completion_tokens":
+                            0,
                         },
                     }
                     supabaseClient.log_event(
@@ -854,10 +908,10 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
                         "created": time.time(),
                         "error": traceback_exception,
                         "usage": {
-                            "prompt_tokens": prompt_token_calculator(
-                                model, messages=messages
-                            ),
-                            "completion_tokens": 0,
+                            "prompt_tokens":
+                            prompt_token_calculator(model, messages=messages),
+                            "completion_tokens":
+                            0,
                         },
                     }
                     liteDebuggerClient.log_event(
@@ -884,19 +938,18 @@ def handle_failure(exception, traceback_exception, start_time, end_time, args, k
             failure_handler(call_details)
         pass
     except Exception as e:
-        ## LOGGING
+        # LOGGING
         exception_logging(logger_fn=user_logger_fn, exception=e)
         pass
 
 
 def handle_success(args, kwargs, result, start_time, end_time):
-    global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient
+    global heliconeLogger, aispendLogger, supabaseClient, liteDebuggerClient, llmonitorLogger
     try:
         success_handler = additional_details.pop("success_handler", None)
         failure_handler = additional_details.pop("failure_handler", None)
         additional_details["Event_Name"] = additional_details.pop(
-            "successful_event_name", "litellm.succes_query"
-        )
+            "successful_event_name", "litellm.succes_query")
         for callback in litellm.success_callback:
             try:
                 if callback == "posthog":
@@ -905,9 +958,8 @@ def handle_success(args, kwargs, result, start_time, end_time):
                         ph_obj[detail] = additional_details[detail]
                     event_name = additional_details["Event_Name"]
                     if "user_id" in additional_details:
-                        posthog.capture(
-                            additional_details["user_id"], event_name, ph_obj
-                        )
+                        posthog.capture(additional_details["user_id"],
+                                        event_name, ph_obj)
                     else:  # PostHog calls require a unique id to identify a user - https://posthog.com/docs/libraries/python
                         unique_id = str(uuid.uuid4())
                         posthog.capture(unique_id, event_name, ph_obj)
@@ -916,9 +968,8 @@ def handle_success(args, kwargs, result, start_time, end_time):
                     slack_msg = ""
                     for detail in additional_details:
                         slack_msg += f"{detail}: {additional_details[detail]}\n"
-                    slack_app.client.chat_postMessage(
-                        channel=alerts_channel, text=slack_msg
-                    )
+                    slack_app.client.chat_postMessage(channel=alerts_channel,
+                                                      text=slack_msg)
                 elif callback == "helicone":
                     print_verbose("reaches helicone for logging!")
                     model = args[0] if len(args) > 0 else kwargs["model"]
@@ -931,6 +982,22 @@ def handle_success(args, kwargs, result, start_time, end_time):
                         end_time=end_time,
                         print_verbose=print_verbose,
                     )
+                elif callback == "llmonitor":
+                    print_verbose("reaches llmonitor for logging!")
+                    model = args[0] if len(args) > 0 else kwargs["model"]
+                    messages = args[1] if len(args) > 1 else kwargs["messages"]
+                    usage = kwargs["usage"]
+                    llmonitorLogger.log_event(
+                        type="end",
+                        model=model,
+                        messages=messages,
+                        user_id=litellm._thread_context.user,
+                        response_obj=result,
+                        time=end_time,
+                        usage=usage,
+                        run_id=kwargs["litellm_call_id"],
+                        print_verbose=print_verbose,
+                    )
                 elif callback == "aispend":
                     print_verbose("reaches aispend for logging!")
                     model = args[0] if len(args) > 0 else kwargs["model"]
@@ -984,7 +1051,7 @@ def handle_success(args, kwargs, result, start_time, end_time):
                         print_verbose=print_verbose,
                     )
             except Exception as e:
-                ## LOGGING
+                # LOGGING
                 exception_logging(logger_fn=user_logger_fn, exception=e)
                 print_verbose(
                     f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
@@ -995,7 +1062,7 @@ def handle_success(args, kwargs, result, start_time, end_time):
             success_handler(args, kwargs)
         pass
     except Exception as e:
-        ## LOGGING
+        # LOGGING
         exception_logging(logger_fn=user_logger_fn, exception=e)
         print_verbose(
             f"[Non-Blocking] Success Callback Error - {traceback.format_exc()}"
@@ -1046,33 +1113,36 @@ def exception_type(model, original_exception, custom_llm_provider):
                 exception_type = ""
             if "claude" in model:  # one of the anthropics
                 if hasattr(original_exception, "status_code"):
-                    print_verbose(f"status_code: {original_exception.status_code}")
+                    print_verbose(
+                        f"status_code: {original_exception.status_code}")
                     if original_exception.status_code == 401:
                         exception_mapping_worked = True
                         raise AuthenticationError(
-                            message=f"AnthropicException - {original_exception.message}",
+                            message=
+                            f"AnthropicException - {original_exception.message}",
                             llm_provider="anthropic",
                         )
                     elif original_exception.status_code == 400:
                         exception_mapping_worked = True
                         raise InvalidRequestError(
-                            message=f"AnthropicException - {original_exception.message}",
+                            message=
+                            f"AnthropicException - {original_exception.message}",
                             model=model,
                             llm_provider="anthropic",
                         )
                     elif original_exception.status_code == 429:
                         exception_mapping_worked = True
                         raise RateLimitError(
-                            message=f"AnthropicException - {original_exception.message}",
+                            message=
+                            f"AnthropicException - {original_exception.message}",
                             llm_provider="anthropic",
                         )
-                elif (
-                    "Could not resolve authentication method. Expected either api_key or auth_token to be set."
-                    in error_str
-                ):
+                elif ("Could not resolve authentication method. Expected either api_key or auth_token to be set."
+                      in error_str):
                     exception_mapping_worked = True
                     raise AuthenticationError(
-                        message=f"AnthropicException - {original_exception.message}",
+                        message=
+                        f"AnthropicException - {original_exception.message}",
                         llm_provider="anthropic",
                     )
             elif "replicate" in model:
@@ -1096,35 +1166,36 @@ def exception_type(model, original_exception, custom_llm_provider):
                         llm_provider="replicate",
                     )
                 elif (
-                    exception_type == "ReplicateError"
-                ):  ## ReplicateError implies an error on Replicate server side, not user side
+                        exception_type == "ReplicateError"
+                ):  # ReplicateError implies an error on Replicate server side, not user side
                     raise ServiceUnavailableError(
                         message=f"ReplicateException - {error_str}",
                         llm_provider="replicate",
                     )
             elif model == "command-nightly":  # Cohere
-                if (
-                    "invalid api token" in error_str
-                    or "No API key provided." in error_str
-                ):
+                if ("invalid api token" in error_str
+                        or "No API key provided." in error_str):
                     exception_mapping_worked = True
                     raise AuthenticationError(
-                        message=f"CohereException - {original_exception.message}",
+                        message=
+                        f"CohereException - {original_exception.message}",
                         llm_provider="cohere",
                     )
                 elif "too many tokens" in error_str:
                     exception_mapping_worked = True
                     raise InvalidRequestError(
-                        message=f"CohereException - {original_exception.message}",
+                        message=
+                        f"CohereException - {original_exception.message}",
                         model=model,
                         llm_provider="cohere",
                     )
                 elif (
-                    "CohereConnectionError" in exception_type
+                        "CohereConnectionError" in exception_type
                 ):  # cohere seems to fire these errors when we load test it (1k+ messages / min)
                     exception_mapping_worked = True
                     raise RateLimitError(
-                        message=f"CohereException - {original_exception.message}",
+                        message=
+                        f"CohereException - {original_exception.message}",
                         llm_provider="cohere",
                     )
             elif custom_llm_provider == "huggingface":
@@ -1132,27 +1203,30 @@ def exception_type(model, original_exception, custom_llm_provider):
                     if original_exception.status_code == 401:
                         exception_mapping_worked = True
                         raise AuthenticationError(
-                            message=f"HuggingfaceException - {original_exception.message}",
+                            message=
+                            f"HuggingfaceException - {original_exception.message}",
                             llm_provider="huggingface",
                         )
                     elif original_exception.status_code == 400:
                         exception_mapping_worked = True
                         raise InvalidRequestError(
-                            message=f"HuggingfaceException - {original_exception.message}",
+                            message=
+                            f"HuggingfaceException - {original_exception.message}",
                             model=model,
                             llm_provider="huggingface",
                         )
                     elif original_exception.status_code == 429:
                         exception_mapping_worked = True
                         raise RateLimitError(
-                            message=f"HuggingfaceException - {original_exception.message}",
+                            message=
+                            f"HuggingfaceException - {original_exception.message}",
                             llm_provider="huggingface",
                         )
             raise original_exception  # base case - return the original exception
         else:
             raise original_exception
     except Exception as e:
-        ## LOGGING
+        # LOGGING
         exception_logging(
             logger_fn=user_logger_fn,
             additional_args={
@@ -1173,7 +1247,7 @@ def safe_crash_reporting(model=None, exception=None, custom_llm_provider=None):
         "exception": str(exception),
         "custom_llm_provider": custom_llm_provider,
     }
-    threading.Thread(target=litellm_telemetry, args=(data,)).start()
+    threading.Thread(target=litellm_telemetry, args=(data, )).start()
 
 
 def litellm_telemetry(data):
@@ -1223,11 +1297,13 @@ def get_secret(secret_name):
     if litellm.secret_manager_client != None:
         # TODO: check which secret manager is being used
         # currently only supports Infisical
-        secret = litellm.secret_manager_client.get_secret(secret_name).secret_value
+        secret = litellm.secret_manager_client.get_secret(
+            secret_name).secret_value
         if secret != None:
             return secret  # if secret found in secret manager return it
         else:
-            raise ValueError(f"Secret '{secret_name}' not found in secret manager")
+            raise ValueError(
+                f"Secret '{secret_name}' not found in secret manager")
     elif litellm.api_key != None:  # if users use litellm default key
         return litellm.api_key
     else:
@@ -1238,6 +1314,7 @@ def get_secret(secret_name):
 # wraps the completion stream to return the correct format for the model
 # replicate/anthropic/cohere
 class CustomStreamWrapper:
+
     def __init__(self, completion_stream, model, custom_llm_provider=None):
         self.model = model
         self.custom_llm_provider = custom_llm_provider
@@ -1288,7 +1365,8 @@ class CustomStreamWrapper:
         elif self.model == "replicate":
             chunk = next(self.completion_stream)
             completion_obj["content"] = chunk
-        elif (self.model == "together_ai") or ("togethercomputer" in self.model):
+        elif (self.model == "together_ai") or ("togethercomputer"
+                                               in self.model):
             chunk = next(self.completion_stream)
             text_data = self.handle_together_ai_chunk(chunk)
             if text_data == "":
@@ -1321,12 +1399,11 @@ def read_config_args(config_path):
 
 
 ########## ollama implementation ############################
-import aiohttp
 
 
-async def get_ollama_response_stream(
-    api_base="http://localhost:11434", model="llama2", prompt="Why is the sky blue?"
-):
+async def get_ollama_response_stream(api_base="http://localhost:11434",
+                                     model="llama2",
+                                     prompt="Why is the sky blue?"):
     session = aiohttp.ClientSession()
     url = f"{api_base}/api/generate"
     data = {
@@ -1349,7 +1426,11 @@ async def get_ollama_response_stream(
                                         "content": "",
                                     }
                                     completion_obj["content"] = j["response"]
-                                    yield {"choices": [{"delta": completion_obj}]}
+                                    yield {
+                                        "choices": [{
+                                            "delta": completion_obj
+                                        }]
+                                    }
                                     # self.responses.append(j["response"])
                                     # yield "blank"
                     except Exception as e:
diff --git a/proxy-server/readme.md b/proxy-server/readme.md
index 4f735f38c..edd03de3f 100644
--- a/proxy-server/readme.md
+++ b/proxy-server/readme.md
@@ -1,6 +1,7 @@
-
 # liteLLM Proxy Server: 50+ LLM Models, Error Handling, Caching
+
 ### Azure, Llama2, OpenAI, Claude, Hugging Face, Replicate Models
+
 [![PyPI Version](https://img.shields.io/pypi/v/litellm.svg)](https://pypi.org/project/litellm/)
 [![PyPI Version](https://img.shields.io/badge/stable%20version-v0.1.345-blue?color=green&link=https://pypi.org/project/litellm/0.1.1/)](https://pypi.org/project/litellm/0.1.1/)
 ![Downloads](https://img.shields.io/pypi/dm/litellm)
@@ -11,34 +12,36 @@
 ![4BC6491E-86D0-4833-B061-9F54524B2579](https://github.com/BerriAI/litellm/assets/17561003/f5dd237b-db5e-42e1-b1ac-f05683b1d724)
 
 ## What does liteLLM proxy do
+
 - Make `/chat/completions` requests for 50+ LLM models **Azure, OpenAI, Replicate, Anthropic, Hugging Face**
-  
+
   Example: for `model` use `claude-2`, `gpt-3.5`, `gpt-4`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
   ```json
   {
     "model": "replicate/llama-2-70b-chat:2c1608e18606fad2812020dc541930f2d0495ce32eee50074220b87300bc16e1",
     "messages": [
-                    { 
-                        "content": "Hello, whats the weather in San Francisco??",
-                        "role": "user"
-                    }
-                ]
+      {
+        "content": "Hello, whats the weather in San Francisco??",
+        "role": "user"
+      }
+    ]
   }
   ```
-- **Consistent Input/Output** Format
-    - Call all models using the OpenAI format - `completion(model, messages)`
-    - Text responses will always be available at `['choices'][0]['message']['content']`
-- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
-- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
 
- **Example: Logs sent to Supabase**
+- **Consistent Input/Output** Format
+  - Call all models using the OpenAI format - `completion(model, messages)`
+  - Text responses will always be available at `['choices'][0]['message']['content']`
+- **Error Handling** Using Model Fallbacks (if `GPT-4` fails, try `llama2`)
+- **Logging** - Log Requests, Responses and Errors to `Supabase`, `Posthog`, `Mixpanel`, `Sentry`, `Helicone`, `LLMonitor` (Any of the supported providers here: https://litellm.readthedocs.io/en/latest/advanced/
+
+  **Example: Logs sent to Supabase**
   <img width="1015" alt="Screenshot 2023-08-11 at 4 02 46 PM" src="https://github.com/ishaan-jaff/proxy-server/assets/29436595/237557b8-ba09-4917-982c-8f3e1b2c8d08">
 
 - **Token Usage & Spend** - Track Input + Completion tokens used + Spend/model
 - **Caching** - Implementation of Semantic Caching
 - **Streaming & Async Support** - Return generators to stream text responses
 
-
 ## API Endpoints
 
 ### `/chat/completions` (POST)
@@ -46,34 +49,37 @@
 This endpoint is used to generate chat completions for 50+ support LLM API Models. Use llama2, GPT-4, Claude2 etc
 
 #### Input
+
 This API endpoint accepts all inputs in raw JSON and expects the following inputs
-- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/): 
- eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
+
+- `model` (string, required): ID of the model to use for chat completions. See all supported models [here]: (https://litellm.readthedocs.io/en/latest/supported/):
+  eg `gpt-3.5-turbo`, `gpt-4`, `claude-2`, `command-nightly`, `stabilityai/stablecode-completion-alpha-3b-4k`
 - `messages` (array, required): A list of messages representing the conversation context. Each message should have a `role` (system, user, assistant, or function), `content` (message text), and `name` (for function role).
 - Additional Optional parameters: `temperature`, `functions`, `function_call`, `top_p`, `n`, `stream`. See the full list of supported inputs here: https://litellm.readthedocs.io/en/latest/input/
 
-
 #### Example JSON body
+
 For claude-2
+
 ```json
 {
-    "model": "claude-2",
-    "messages": [
-                    { 
-                        "content": "Hello, whats the weather in San Francisco??",
-                        "role": "user"
-                    }
-                ]
-    
+  "model": "claude-2",
+  "messages": [
+    {
+      "content": "Hello, whats the weather in San Francisco??",
+      "role": "user"
+    }
+  ]
 }
 ```
 
 ### Making an API request to the Proxy Server
+
 ```python
 import requests
 import json
 
-# TODO: use your URL 
+# TODO: use your URL
 url = "http://localhost:5000/chat/completions"
 
 payload = json.dumps({
@@ -94,34 +100,38 @@ print(response.text)
 ```
 
 ### Output [Response Format]
-Responses from the server are given in the following format. 
+
+Responses from the server are given in the following format.
 All responses from the server are returned in the following format (for all LLM models). More info on output here: https://litellm.readthedocs.io/en/latest/output/
+
 ```json
 {
-    "choices": [
-        {
-            "finish_reason": "stop",
-            "index": 0,
-            "message": {
-                "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
-                "role": "assistant"
-            }
-        }
-    ],
-    "created": 1691790381,
-    "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
-    "model": "gpt-3.5-turbo-0613",
-    "object": "chat.completion",
-    "usage": {
-        "completion_tokens": 41,
-        "prompt_tokens": 16,
-        "total_tokens": 57
+  "choices": [
+    {
+      "finish_reason": "stop",
+      "index": 0,
+      "message": {
+        "content": "I'm sorry, but I don't have the capability to provide real-time weather information. However, you can easily check the weather in San Francisco by searching online or using a weather app on your phone.",
+        "role": "assistant"
+      }
     }
+  ],
+  "created": 1691790381,
+  "id": "chatcmpl-7mUFZlOEgdohHRDx2UpYPRTejirzb",
+  "model": "gpt-3.5-turbo-0613",
+  "object": "chat.completion",
+  "usage": {
+    "completion_tokens": 41,
+    "prompt_tokens": 16,
+    "total_tokens": 57
+  }
 }
 ```
 
 ## Installation & Usage
+
 ### Running Locally
+
 1. Clone liteLLM repository to your local machine:
    ```
    git clone https://github.com/BerriAI/liteLLM-proxy
@@ -141,24 +151,24 @@ All responses from the server are returned in the following format (for all LLM
    python main.py
    ```
 
-   
-
 ## Deploying
+
 1. Quick Start: Deploy on Railway
 
    [![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/DYqQAW?referralCode=t3ukrU)
-   
-2. `GCP`, `AWS`, `Azure` 
-This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
+
+2. `GCP`, `AWS`, `Azure`
+   This project includes a `Dockerfile` allowing you to build and deploy a Docker Project on your providers
 
 # Support / Talk with founders
+
 - [Our calendar 👋](https://calendly.com/d/4mp-gd3-k5k/berriai-1-1-onboarding-litellm-hosted-version)
 - [Community Discord 💭](https://discord.gg/wuPM9dRgDw)
 - Our numbers 📞 +1 (770) 8783-106 / +1 (412) 618-6238
 - Our emails ✉️ ishaan@berri.ai / krrish@berri.ai
 
-
 ## Roadmap
+
 - [ ] Support hosted db (e.g. Supabase)
 - [ ] Easily send data to places like posthog and sentry.
 - [ ] Add a hot-cache for project spend logs - enables fast checks for user + project limitings