phoenix/litellm

forked from phoenix/litellm-mirror

History

Ishaan Jaff 4d1b4beb3d (refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 ) * use folder for caching * fix importing caching * fix clickhouse pyright * fix linting * fix correctly pass kwargs and args * fix test case for embedding * fix linting * fix embedding caching logic * fix refactor handle utils.py * fix test_embedding_caching_azure_individual_items_reordered		2024-10-14 16:34:01 +05:30
..
main.py	(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 )	2024-10-14 16:34:01 +05:30
Readme.md	(fix) batch_completion fails with bedrock due to extraneous [max_workers] key (#6176 )	2024-10-12 14:10:24 +05:30

Readme.md

Implementation of `litellm.batch_completion`, `litellm.batch_completion_models`, `litellm.batch_completion_models_all_responses`

Doc: https://docs.litellm.ai/docs/completion/batching

LiteLLM Python SDK allows you to:

litellm.batch_completion Batch litellm.completion function for a given model.
litellm.batch_completion_models Send a request to multiple language models concurrently and return the response as soon as one of the models responds.
litellm.batch_completion_models_all_responses Send a request to multiple language models concurrently and return a list of responses from all models that respond.