(docs) getting started

2024-02-09 09:42:35 -08:00 · 2024-02-09 09:42:35 -08:00 · 854eee3785
commit 854eee3785
parent f0fc7f1552
2 changed files with 24 additions and 104 deletions
--- a/docs/my-website/docs/index.md
+++ b/docs/my-website/docs/index.md
@ -5,10 +5,14 @@ import TabItem from '@theme/TabItem';

 https://github.com/BerriAI/litellm

-import QuickStart from '../src/components/QuickStart.js'

 ## **Call 100+ LLMs using the same Input/Output Format**

+- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
+- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
+- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
+- Track spend & set budgets per project [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy)
+
 ## Basic usage 
 <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@ -157,9 +161,6 @@ response = completion(
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
 )
-
-for chunk in response: 
-  print(chunk)
 ```

 </TabItem>
@ -177,9 +178,6 @@ response = completion(
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
 )
-
-for chunk in response: 
-  print(chunk)
 ```

 </TabItem>
@ -199,9 +197,6 @@ response = completion(
  messages=[{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
 )
-
-for chunk in response: 
-  print(chunk)
 ```

 </TabItem>
@ -222,9 +217,7 @@ response = completion(
  stream=True,
 )

-
-for chunk in response: 
-  print(chunk)
+print(response)
 ```

 </TabItem>
@ -246,9 +239,6 @@ response = completion(
  messages = [{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
 )
-
-for chunk in response: 
-  print(chunk)
 ```

 </TabItem>
@ -265,9 +255,6 @@ response = completion(
            api_base="http://localhost:11434",
            stream=True,
 )
-
-for chunk in response: 
-  print(chunk)
 ```
 </TabItem>
 <TabItem value="or" label="Openrouter">
@ -284,9 +271,6 @@ response = completion(
  messages = [{ "content": "Hello, how are you?","role": "user"}],
  stream=True,
 )
-
-for chunk in response: 
-  print(chunk)
 ```
 </TabItem>

@ -327,34 +311,8 @@ litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langf
 response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
 ```

-## Calculate Costs, Usage, Latency
-
-Pass the completion response to `litellm.completion_cost(completion_response=response)` and get the cost
-
-```python
-from litellm import completion, completion_cost
-import os
-os.environ["OPENAI_API_KEY"] = "your-api-key"
-
-response = completion(
-  model="gpt-3.5-turbo", 
-  messages=[{ "content": "Hello, how are you?","role": "user"}]
-)
-
-cost = completion_cost(completion_response=response)
-print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")
-```
-
-**Output**
-```shell
-Cost for completion call with gpt-3.5-turbo:  $0.0000775000
-```
-
-### Track Costs, Usage, Latency for streaming
-We use a custom callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
- We define a callback function to calculate cost `def track_cost_callback()`
- In `def track_cost_callback()` we check if the stream is complete - `if "complete_streaming_response" in kwargs`
- Use `litellm.completion_cost()` to calculate cost, once the stream is complete
+## Track Costs, Usage, Latency for streaming
+Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback

 ```python
 import litellm
@ -366,18 +324,8 @@ def track_cost_callback(
    start_time, end_time    # start/end time
 ):
    try:
-        # check if it has collected an entire stream response
-        if "complete_streaming_response" in kwargs:
-            # for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost 
-            completion_response=kwargs["complete_streaming_response"]
-            input_text = kwargs["messages"]
-            output_text = completion_response["choices"][0]["message"]["content"]
-            response_cost = litellm.completion_cost(
-                model = kwargs["model"],
-                messages = input_text,
-                completion=output_text
-            )
-            print("streaming response_cost", response_cost)
+      response_cost = kwargs.get("response_cost", 0)
+      print("streaming response_cost", response_cost)
    except:
        pass
 # set callback 
@ -400,6 +348,8 @@ response = completion(

 Track spend across multiple projects/people 

+![ui_3](https://github.com/BerriAI/litellm/assets/29436595/47c97d5e-b9be-4839-b28c-43d7f4f10033)
+
 The proxy provides: 
 1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
 2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
@ -436,8 +386,7 @@ response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
 print(response)
 ```

-
 ## More details
 * [exception mapping](./exception_mapping.md)
 * [retries + model fallbacks for completion()](./completion/reliable_completions.md)
-* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
+* [proxy virtual keys & spend management](./tutorials/fallbacks.md)
--- a/docs/my-website/src/pages/index.md
+++ b/docs/my-website/src/pages/index.md
@ -8,6 +8,11 @@ https://github.com/BerriAI/litellm

 ## **Call 100+ LLMs using the same Input/Output Format**

+- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
+- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
+- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
+- Track spend & set budgets per project [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy)
+
 ## Basic usage 
 <a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
@ -306,30 +311,7 @@ litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langf
 response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
 ```

-## Calculate Costs, Usage, Latency
-
-Pass the completion response to `litellm.completion_cost(completion_response=response)` and get the cost
-
-```python
-from litellm import completion, completion_cost
-import os
-os.environ["OPENAI_API_KEY"] = "your-api-key"
-
-response = completion(
-  model="gpt-3.5-turbo", 
-  messages=[{ "content": "Hello, how are you?","role": "user"}]
-)
-
-cost = completion_cost(completion_response=response)
-print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")
-```
-
-**Output**
-```shell
-Cost for completion call with gpt-3.5-turbo:  $0.0000775000
-```
-
-### Track Costs, Usage, Latency for streaming
+## Track Costs, Usage, Latency for streaming
 Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback

 ```python
@ -342,18 +324,8 @@ def track_cost_callback(
    start_time, end_time    # start/end time
 ):
    try:
-        # check if it has collected an entire stream response
-        if "complete_streaming_response" in kwargs:
-            # for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost 
-            completion_response=kwargs["complete_streaming_response"]
-            input_text = kwargs["messages"]
-            output_text = completion_response["choices"][0]["message"]["content"]
-            response_cost = litellm.completion_cost(
-                model = kwargs["model"],
-                messages = input_text,
-                completion=output_text
-            )
-            print("streaming response_cost", response_cost)
+      response_cost = kwargs.get("response_cost", 0)
+      print("streaming response_cost", response_cost)
    except:
        pass
 # set callback 
@ -372,13 +344,12 @@ response = completion(
 )
 ```

-
-Need a dedicated key? Email us @ krrish@berri.ai
-
 ## OpenAI Proxy

 Track spend across multiple projects/people 

+![ui_3](https://github.com/BerriAI/litellm/assets/29436595/47c97d5e-b9be-4839-b28c-43d7f4f10033)
+
 The proxy provides: 
 1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
 2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
@ -418,4 +389,4 @@ print(response)
 ## More details
 * [exception mapping](./exception_mapping.md)
 * [retries + model fallbacks for completion()](./completion/reliable_completions.md)
-* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
+* [proxy virtual keys & spend management](./tutorials/fallbacks.md)