mirror of
https://github.com/BerriAI/litellm.git
synced 2025-04-27 11:43:54 +00:00
* fix(litellm_logging.py): ensure cache hits are scrubbed if 'turn_off_message_logging' is enabled * fix(sagemaker.py): fix streaming to raise error immediately Fixes https://github.com/BerriAI/litellm/issues/6054 * (fixes) gcs bucket key based logging (#6044) * fixes for gcs bucket logging * fix StandardCallbackDynamicParams * fix - gcs logging when payload is not serializable * add test_add_callback_via_key_litellm_pre_call_utils_gcs_bucket * working success callbacks * linting fixes * fix linting error * add type hints to functions * fixes for dynamic success and failure logging * fix for test_async_chat_openai_stream * fix handle case when key based logging vars are set as os.environ/ vars * fix prometheus track cooldown events on custom logger (#6060) * (docs) add 1k rps load test doc (#6059) * docs 1k rps load test * docs load testing * docs load testing litellm * docs load testing * clean up load test doc * docs prom metrics for load testing * docs using prometheus on load testing * doc load testing with prometheus * (fixes) docs + qa - gcs key based logging (#6061) * fixes for required values for gcs bucket * docs gcs bucket logging * bump: version 1.48.12 → 1.48.13 * ci/cd run again * bump: version 1.48.13 → 1.48.14 * update load test doc * (docs) router settings - on litellm config (#6037) * add yaml with all router settings * add docs for router settings * docs router settings litellm settings * (feat) OpenAI prompt caching models to model cost map (#6063) * add prompt caching for latest models * add cache_read_input_token_cost for prompt caching models * fix(litellm_logging.py): check if param is iterable Fixes https://github.com/BerriAI/litellm/issues/6025#issuecomment-2393929946 * fix(factory.py): support passing an 'assistant_continue_message' to prevent bedrock error Fixes https://github.com/BerriAI/litellm/issues/6053 * fix(databricks/chat): handle streaming responses * fix(factory.py): fix linting error * fix(utils.py): unify anthropic + deepseek prompt caching information to openai format Fixes https://github.com/BerriAI/litellm/issues/6069 * test: fix test * fix(types/utils.py): support all openai roles Fixes https://github.com/BerriAI/litellm/issues/6052 * test: fix test --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
51 lines
1.1 KiB
Markdown
51 lines
1.1 KiB
Markdown
# Usage
|
|
|
|
LiteLLM returns the OpenAI compatible usage object across all providers.
|
|
|
|
```bash
|
|
"usage": {
|
|
"prompt_tokens": int,
|
|
"completion_tokens": int,
|
|
"total_tokens": int
|
|
}
|
|
```
|
|
|
|
## Quick Start
|
|
|
|
```python
|
|
from litellm import completion
|
|
import os
|
|
|
|
## set ENV variables
|
|
os.environ["OPENAI_API_KEY"] = "your-api-key"
|
|
|
|
response = completion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{ "content": "Hello, how are you?","role": "user"}]
|
|
)
|
|
|
|
print(response.usage)
|
|
```
|
|
|
|
## Streaming Usage
|
|
|
|
if `stream_options={"include_usage": True}` is set, an additional chunk will be streamed before the data: [DONE] message. The usage field on this chunk shows the token usage statistics for the entire request, and the choices field will always be an empty array. All other chunks will also include a usage field, but with a null value.
|
|
|
|
|
|
```python
|
|
from litellm import completion
|
|
|
|
completion = completion(
|
|
model="gpt-4o",
|
|
messages=[
|
|
{"role": "system", "content": "You are a helpful assistant."},
|
|
{"role": "user", "content": "Hello!"}
|
|
],
|
|
stream=True,
|
|
stream_options={"include_usage": True}
|
|
)
|
|
|
|
for chunk in completion:
|
|
print(chunk.choices[0].delta)
|
|
|
|
```
|