forked from phoenix/litellm-mirror
(docs) getting started
This commit is contained in:
parent
f0fc7f1552
commit
854eee3785
2 changed files with 24 additions and 104 deletions
|
@ -5,10 +5,14 @@ import TabItem from '@theme/TabItem';
|
||||||
|
|
||||||
https://github.com/BerriAI/litellm
|
https://github.com/BerriAI/litellm
|
||||||
|
|
||||||
import QuickStart from '../src/components/QuickStart.js'
|
|
||||||
|
|
||||||
## **Call 100+ LLMs using the same Input/Output Format**
|
## **Call 100+ LLMs using the same Input/Output Format**
|
||||||
|
|
||||||
|
- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
|
||||||
|
- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
|
||||||
|
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
|
||||||
|
- Track spend & set budgets per project [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy)
|
||||||
|
|
||||||
## Basic usage
|
## Basic usage
|
||||||
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
|
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
|
||||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
||||||
|
@ -157,9 +161,6 @@ response = completion(
|
||||||
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
@ -177,9 +178,6 @@ response = completion(
|
||||||
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
@ -199,9 +197,6 @@ response = completion(
|
||||||
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
messages=[{ "content": "Hello, how are you?","role": "user"}],
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
@ -222,9 +217,7 @@ response = completion(
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
|
print(response)
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
@ -246,9 +239,6 @@ response = completion(
|
||||||
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
|
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
@ -265,9 +255,6 @@ response = completion(
|
||||||
api_base="http://localhost:11434",
|
api_base="http://localhost:11434",
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
</TabItem>
|
</TabItem>
|
||||||
<TabItem value="or" label="Openrouter">
|
<TabItem value="or" label="Openrouter">
|
||||||
|
@ -284,9 +271,6 @@ response = completion(
|
||||||
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
messages = [{ "content": "Hello, how are you?","role": "user"}],
|
||||||
stream=True,
|
stream=True,
|
||||||
)
|
)
|
||||||
|
|
||||||
for chunk in response:
|
|
||||||
print(chunk)
|
|
||||||
```
|
```
|
||||||
</TabItem>
|
</TabItem>
|
||||||
|
|
||||||
|
@ -327,34 +311,8 @@ litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langf
|
||||||
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
|
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
|
||||||
```
|
```
|
||||||
|
|
||||||
## Calculate Costs, Usage, Latency
|
## Track Costs, Usage, Latency for streaming
|
||||||
|
Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
|
||||||
Pass the completion response to `litellm.completion_cost(completion_response=response)` and get the cost
|
|
||||||
|
|
||||||
```python
|
|
||||||
from litellm import completion, completion_cost
|
|
||||||
import os
|
|
||||||
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
|
||||||
|
|
||||||
response = completion(
|
|
||||||
model="gpt-3.5-turbo",
|
|
||||||
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
|
||||||
)
|
|
||||||
|
|
||||||
cost = completion_cost(completion_response=response)
|
|
||||||
print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Output**
|
|
||||||
```shell
|
|
||||||
Cost for completion call with gpt-3.5-turbo: $0.0000775000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Track Costs, Usage, Latency for streaming
|
|
||||||
We use a custom callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
|
|
||||||
- We define a callback function to calculate cost `def track_cost_callback()`
|
|
||||||
- In `def track_cost_callback()` we check if the stream is complete - `if "complete_streaming_response" in kwargs`
|
|
||||||
- Use `litellm.completion_cost()` to calculate cost, once the stream is complete
|
|
||||||
|
|
||||||
```python
|
```python
|
||||||
import litellm
|
import litellm
|
||||||
|
@ -366,18 +324,8 @@ def track_cost_callback(
|
||||||
start_time, end_time # start/end time
|
start_time, end_time # start/end time
|
||||||
):
|
):
|
||||||
try:
|
try:
|
||||||
# check if it has collected an entire stream response
|
response_cost = kwargs.get("response_cost", 0)
|
||||||
if "complete_streaming_response" in kwargs:
|
print("streaming response_cost", response_cost)
|
||||||
# for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost
|
|
||||||
completion_response=kwargs["complete_streaming_response"]
|
|
||||||
input_text = kwargs["messages"]
|
|
||||||
output_text = completion_response["choices"][0]["message"]["content"]
|
|
||||||
response_cost = litellm.completion_cost(
|
|
||||||
model = kwargs["model"],
|
|
||||||
messages = input_text,
|
|
||||||
completion=output_text
|
|
||||||
)
|
|
||||||
print("streaming response_cost", response_cost)
|
|
||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
# set callback
|
# set callback
|
||||||
|
@ -400,6 +348,8 @@ response = completion(
|
||||||
|
|
||||||
Track spend across multiple projects/people
|
Track spend across multiple projects/people
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
The proxy provides:
|
The proxy provides:
|
||||||
1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
|
1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
|
||||||
2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
|
2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
|
||||||
|
@ -436,8 +386,7 @@ response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
|
||||||
print(response)
|
print(response)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
## More details
|
## More details
|
||||||
* [exception mapping](./exception_mapping.md)
|
* [exception mapping](./exception_mapping.md)
|
||||||
* [retries + model fallbacks for completion()](./completion/reliable_completions.md)
|
* [retries + model fallbacks for completion()](./completion/reliable_completions.md)
|
||||||
* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
|
* [proxy virtual keys & spend management](./tutorials/fallbacks.md)
|
|
@ -8,6 +8,11 @@ https://github.com/BerriAI/litellm
|
||||||
|
|
||||||
## **Call 100+ LLMs using the same Input/Output Format**
|
## **Call 100+ LLMs using the same Input/Output Format**
|
||||||
|
|
||||||
|
- Translate inputs to provider's `completion`, `embedding`, and `image_generation` endpoints
|
||||||
|
- [Consistent output](https://docs.litellm.ai/docs/completion/output), text responses will always be available at `['choices'][0]['message']['content']`
|
||||||
|
- Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - [Router](https://docs.litellm.ai/docs/routing)
|
||||||
|
- Track spend & set budgets per project [OpenAI Proxy Server](https://docs.litellm.ai/docs/simple_proxy)
|
||||||
|
|
||||||
## Basic usage
|
## Basic usage
|
||||||
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
|
<a target="_blank" href="https://colab.research.google.com/github/BerriAI/litellm/blob/main/cookbook/liteLLM_Getting_Started.ipynb">
|
||||||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
|
||||||
|
@ -306,30 +311,7 @@ litellm.success_callback = ["langfuse", "llmonitor"] # log input/output to langf
|
||||||
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
|
response = completion(model="gpt-3.5-turbo", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])
|
||||||
```
|
```
|
||||||
|
|
||||||
## Calculate Costs, Usage, Latency
|
## Track Costs, Usage, Latency for streaming
|
||||||
|
|
||||||
Pass the completion response to `litellm.completion_cost(completion_response=response)` and get the cost
|
|
||||||
|
|
||||||
```python
|
|
||||||
from litellm import completion, completion_cost
|
|
||||||
import os
|
|
||||||
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
|
||||||
|
|
||||||
response = completion(
|
|
||||||
model="gpt-3.5-turbo",
|
|
||||||
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
|
||||||
)
|
|
||||||
|
|
||||||
cost = completion_cost(completion_response=response)
|
|
||||||
print("Cost for completion call with gpt-3.5-turbo: ", f"${float(cost):.10f}")
|
|
||||||
```
|
|
||||||
|
|
||||||
**Output**
|
|
||||||
```shell
|
|
||||||
Cost for completion call with gpt-3.5-turbo: $0.0000775000
|
|
||||||
```
|
|
||||||
|
|
||||||
### Track Costs, Usage, Latency for streaming
|
|
||||||
Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
|
Use a callback function for this - more info on custom callbacks: https://docs.litellm.ai/docs/observability/custom_callback
|
||||||
|
|
||||||
```python
|
```python
|
||||||
|
@ -342,18 +324,8 @@ def track_cost_callback(
|
||||||
start_time, end_time # start/end time
|
start_time, end_time # start/end time
|
||||||
):
|
):
|
||||||
try:
|
try:
|
||||||
# check if it has collected an entire stream response
|
response_cost = kwargs.get("response_cost", 0)
|
||||||
if "complete_streaming_response" in kwargs:
|
print("streaming response_cost", response_cost)
|
||||||
# for tracking streaming cost we pass the "messages" and the output_text to litellm.completion_cost
|
|
||||||
completion_response=kwargs["complete_streaming_response"]
|
|
||||||
input_text = kwargs["messages"]
|
|
||||||
output_text = completion_response["choices"][0]["message"]["content"]
|
|
||||||
response_cost = litellm.completion_cost(
|
|
||||||
model = kwargs["model"],
|
|
||||||
messages = input_text,
|
|
||||||
completion=output_text
|
|
||||||
)
|
|
||||||
print("streaming response_cost", response_cost)
|
|
||||||
except:
|
except:
|
||||||
pass
|
pass
|
||||||
# set callback
|
# set callback
|
||||||
|
@ -372,13 +344,12 @@ response = completion(
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
Need a dedicated key? Email us @ krrish@berri.ai
|
|
||||||
|
|
||||||
## OpenAI Proxy
|
## OpenAI Proxy
|
||||||
|
|
||||||
Track spend across multiple projects/people
|
Track spend across multiple projects/people
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
The proxy provides:
|
The proxy provides:
|
||||||
1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
|
1. [Hooks for auth](https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth)
|
||||||
2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
|
2. [Hooks for logging](https://docs.litellm.ai/docs/proxy/logging#step-1---create-your-custom-litellm-callback-class)
|
||||||
|
@ -418,4 +389,4 @@ print(response)
|
||||||
## More details
|
## More details
|
||||||
* [exception mapping](./exception_mapping.md)
|
* [exception mapping](./exception_mapping.md)
|
||||||
* [retries + model fallbacks for completion()](./completion/reliable_completions.md)
|
* [retries + model fallbacks for completion()](./completion/reliable_completions.md)
|
||||||
* [tutorial for model fallbacks with completion()](./tutorials/fallbacks.md)
|
* [proxy virtual keys & spend management](./tutorials/fallbacks.md)
|
Loading…
Add table
Add a link
Reference in a new issue