Commit graph

82 commits

Author SHA1 Message Date
Ishaan Jaff
d81ae45827
(Perf / latency improvement) improve pass through endpoint latency to ~50ms (before PR was 400ms) (#6874)
* use correct location for types

* fix types location

* perf improvement for pass through endpoints

* update lint check

* fix import

* fix ensure async clients test

* fix azure.py health check

* fix ollama
2024-11-22 18:47:26 -08:00
Krish Dholakia
7e5085dc7b
Litellm dev 11 21 2024 (#6837)
* Fix Vertex AI function calling invoke: use JSON format instead of protobuf text format. (#6702)

* test: test tool_call conversion when arguments is empty dict

Fixes https://github.com/BerriAI/litellm/issues/6833

* fix(openai_like/handler.py): return more descriptive error message

Fixes https://github.com/BerriAI/litellm/issues/6812

* test: skip overloaded model

* docs(anthropic.md): update anthropic docs to show how to route to any new model

* feat(groq/): fake stream when 'response_format' param is passed

Groq doesn't support streaming when response_format is set

* feat(groq/): add response_format support for groq

Closes https://github.com/BerriAI/litellm/issues/6845

* fix(o1_handler.py): remove fake streaming for o1

Closes https://github.com/BerriAI/litellm/issues/6801

* build(model_prices_and_context_window.json): add groq llama3.2b model pricing

Closes https://github.com/BerriAI/litellm/issues/6807

* fix(utils.py): fix handling ollama response format param

Fixes https://github.com/BerriAI/litellm/issues/6848#issuecomment-2491215485

* docs(sidebars.js): refactor chat endpoint placement

* fix: fix linting errors

* test: fix test

* test: fix test

* fix(openai_like/handler): handle max retries

* fix(streaming_handler.py): fix streaming check for openai-compatible providers

* test: update test

* test: correctly handle model is overloaded error

* test: update test

* test: fix test

* test: mark flaky test

---------

Co-authored-by: Guowang Li <Guowang@users.noreply.github.com>
2024-11-22 01:53:52 +05:30
Krish Dholakia
e9aa492af3
LiteLLM Minor Fixes & Improvement (11/14/2024) (#6730)
* fix(ollama.py): fix get model info request

Fixes https://github.com/BerriAI/litellm/issues/6703

* feat(anthropic/chat/transformation.py): support passing user id to anthropic via openai 'user' param

* docs(anthropic.md): document all supported openai params for anthropic

* test: fix tests

* fix: fix tests

* feat(jina_ai/): add rerank support

Closes https://github.com/BerriAI/litellm/issues/6691

* test: handle service unavailable error

* fix(handler.py): refactor together ai rerank call

* test: update test to handle overloaded error

* test: fix test

* Litellm router trace (#6742)

* feat(router.py): add trace_id to parent functions - allows tracking retry/fallbacks

* feat(router.py): log trace id across retry/fallback logic

allows grouping llm logs for the same request

* test: fix tests

* fix: fix test

* fix(transformation.py): only set non-none stop_sequences

* Litellm router disable fallbacks (#6743)

* bump: version 1.52.6 → 1.52.7

* feat(router.py): enable dynamically disabling fallbacks

Allows for enabling/disabling fallbacks per key

* feat(litellm_pre_call_utils.py): support setting 'disable_fallbacks' on litellm key

* test: fix test

* fix(exception_mapping_utils.py): map 'model is overloaded' to internal server error

* test: handle gemini error

* test: fix test

* fix: new run
2024-11-15 01:02:54 +05:30
Krish Dholakia
cb2563e3c0
Litellm dev 10 22 2024 (#6384)
* fix(utils.py): add 'disallowed_special' for token counting on .encode()

Fixes error when '<
endoftext
>' in string

* Revert "(fix) standard logging metadata + add unit testing  (#6366)" (#6381)

This reverts commit 8359cb6fa9.

* add new 35 mode lcard (#6378)

* Add claude 3 5 sonnet 20241022 models for all provides (#6380)

* Add Claude 3.5 v2 on Amazon Bedrock and Vertex AI.

* added anthropic/claude-3-5-sonnet-20241022

* add new 35 mode lcard

---------

Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: lowjiansheng <15527690+lowjiansheng@users.noreply.github.com>

* test(skip-flaky-google-context-caching-test): google is not reliable. their sample code is also not working

* Fix metadata being overwritten in speech() (#6295)

* fix: adding missing redis cluster kwargs (#6318)

Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>

* Add support for `max_completion_tokens` in Azure OpenAI (#6376)

Now that Azure supports `max_completion_tokens`, no need for special handling for this param and let it pass thru. More details: https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/models?tabs=python-secure#api-support

* build(model_prices_and_context_window.json): add voyage-finance-2 pricing

Closes https://github.com/BerriAI/litellm/issues/6371

* build(model_prices_and_context_window.json): fix llama3.1 pricing model name on map

Closes https://github.com/BerriAI/litellm/issues/6310

* feat(realtime_streaming.py): just log specific events

Closes https://github.com/BerriAI/litellm/issues/6267

* fix(utils.py): more robust checking if unmapped vertex anthropic model belongs to that family of models

Fixes https://github.com/BerriAI/litellm/issues/6383

* Fix Ollama stream handling for tool calls with None content (#6155)

* test(test_max_completions): update test now that azure supports 'max_completion_tokens'

* fix(handler.py): fix linting error

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Low Jian Sheng <15527690+lowjiansheng@users.noreply.github.com>
Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
Co-authored-by: Paul Gauthier <paul@paulg.com>
Co-authored-by: John HU <hszqqq12@gmail.com>
Co-authored-by: Ali Arian <113945203+ali-arian@users.noreply.github.com>
Co-authored-by: Ali Arian <ali.arian@breadfinancial.com>
Co-authored-by: Anand Taralika <46954145+taralika@users.noreply.github.com>
Co-authored-by: Nolan Tremelling <34580718+NolanTrem@users.noreply.github.com>
2024-10-22 21:18:54 -07:00
Krish Dholakia
905ebeb924
feat(custom_logger.py): expose new async_dataset_hook for modifying… (#6331)
* feat(custom_logger.py): expose new `async_dataset_hook` for modifying/rejecting argilla items before logging

Allows user more control on what gets logged to argilla for annotations

* feat(google_ai_studio_endpoints.py): add new `/azure/*` pass through route

enables pass-through for azure provider

* feat(utils.py): support checking ollama `/api/show` endpoint for retrieving ollama model info

Fixes https://github.com/BerriAI/litellm/issues/6322

* fix(user_api_key_auth.py): add `/key/delete` to an allowed_ui_routes

Fixes https://github.com/BerriAI/litellm/issues/6236

* fix(user_api_key_auth.py): remove type ignore

* fix(user_api_key_auth.py): route ui vs. api token checks differently

Fixes https://github.com/BerriAI/litellm/issues/6238

* feat(internal_user_endpoints.py): support setting models as a default internal user param

Closes https://github.com/BerriAI/litellm/issues/6239

* fix(user_api_key_auth.py): fix exception string

* fix(user_api_key_auth.py): fix error string

* fix: fix test
2024-10-20 09:00:04 -07:00
Ishaan Jaff
4e88fd65e1
(feat) openai prompt caching (non streaming) - add prompt_tokens_details in usage response (#6039)
* add prompt_tokens_details in usage response

* use _prompt_tokens_details as a param in Usage

* fix linting errors

* fix type error

* fix ci/cd deps

* bump deps for openai

* bump deps openai

* fix llm translation testing

* fix llm translation embedding
2024-10-03 23:31:10 +05:30
Krish Dholakia
d57be47b0f
Litellm ruff linting enforcement (#5992)
* ci(config.yml): add a 'check_code_quality' step

Addresses https://github.com/BerriAI/litellm/issues/5991

* ci(config.yml): check why circle ci doesn't pick up this test

* ci(config.yml): fix to run 'check_code_quality' tests

* fix(__init__.py): fix unprotected import

* fix(__init__.py): don't remove unused imports

* build(ruff.toml): update ruff.toml to ignore unused imports

* fix: fix: ruff + pyright - fix linting + type-checking errors

* fix: fix linting errors

* fix(lago.py): fix module init error

* fix: fix linting errors

* ci(config.yml): cd into correct dir for checks

* fix(proxy_server.py): fix linting error

* fix(utils.py): fix bare except

causes ruff linting errors

* fix: ruff - fix remaining linting errors

* fix(clickhouse.py): use standard logging object

* fix(__init__.py): fix unprotected import

* fix: ruff - fix linting errors

* fix: fix linting errors

* ci(config.yml): cleanup code qa step (formatting handled in local_testing)

* fix(_health_endpoints.py): fix ruff linting errors

* ci(config.yml): just use ruff in check_code_quality pipeline for now

* build(custom_guardrail.py): include missing file

* style(embedding_handler.py): fix ruff check
2024-10-01 19:44:20 -04:00
Krish Dholakia
1e7e538261
LiteLLM Minor fixes + improvements (08/04/2024) (#5505)
* Minor IAM AWS OIDC Improvements (#5246)

* AWS IAM: Temporary tokens are valid across all regions after being issued, so it is wasteful to request one for each region.

* AWS IAM: Include an inline policy, to help reduce misuse of overly permissive IAM roles.

* (test_bedrock_completion.py): Ensure we are testing cross AWS region OIDC flow.

* fix(router.py): log rejected requests

Fixes https://github.com/BerriAI/litellm/issues/5498

* refactor: don't use verbose_logger.exception, if exception is raised

User might already have handling for this. But alerting systems in prod will raise this as an unhandled error.

* fix(datadog.py): support setting datadog source as an env var

Fixes https://github.com/BerriAI/litellm/issues/5508

* docs(logging.md): add dd_source to datadog docs

* fix(proxy_server.py): expose `/customer/list` endpoint for showing all customers

* (bedrock): Fix usage with Cloudflare AI Gateway, and proxies in general. (#5509)

* feat(anthropic.py): support 'cache_control' param for content when it is a string

* Revert "(bedrock): Fix usage with Cloudflare AI Gateway, and proxies in gener…" (#5519)

This reverts commit 3fac0349c2.

* refactor: ci/cd run again

---------

Co-authored-by: David Manouchehri <david.manouchehri@ai.moda>
2024-09-04 22:16:55 -07:00
Krrish Dholakia
33deeda300 feat(ollama.py): support ollama /api/embed endpoint
Closes https://github.com/BerriAI/litellm/issues/5291
2024-08-20 09:10:08 -07:00
Krrish Dholakia
04d69464e2 fix(ollama.py): fix ollama embeddings - pass optional params
Fixes https://github.com/BerriAI/litellm/issues/5267
2024-08-19 08:45:26 -07:00
Krrish Dholakia
61f4b71ef7 refactor: replace .error() with .exception() logging for better debugging on sentry 2024-08-16 09:22:47 -07:00
thiswillbeyourgithub
3eb076edf5 fix: wrong order of arguments for ollama 2024-08-08 17:19:17 +02:00
Krrish Dholakia
311521e56e fix(ollama.py): correctly raise ollama streaming error
Fixes https://github.com/BerriAI/litellm/issues/4974
2024-07-30 15:01:26 -07:00
Titusz
fcef2c4580
Add missing num_gpu ollama configuration parameter 2024-07-18 17:51:56 +02:00
Krrish Dholakia
6e9f048618 fix: move to using pydantic obj for setting values 2024-07-11 13:18:36 -07:00
corrm
423a60c8bc chore: Improved OllamaConfig get_required_params and ollama_acompletion and ollama_async_streaming functions 2024-06-24 05:55:22 +03:00
Krish Dholakia
677e0255c8
Merge branch 'main' into litellm_cleanup_traceback 2024-06-06 16:32:08 -07:00
Krrish Dholakia
6cca5612d2 refactor: replace 'traceback.print_exc()' with logging library
allows error logs to be in json format for otel logging
2024-06-06 13:47:43 -07:00
sha-ahammed
faa4dfe03e feat: Add Ollama as a provider in the proxy UI 2024-06-05 16:48:38 +05:30
KX
d3921a3d28 fix: add missing seed parameter to ollama input
Current ollama interfacing does not allow for seed, which is supported in https://github.com/ollama/ollama/blob/main/docs/api.md#parameters and https://github.com/ollama/ollama/blob/main/docs/modelfile.md#valid-parameters-and-values

This resolves that by adding in handling of seed parameter.
2024-05-31 01:47:56 +08:00
frob
c44970c813
Merge branch 'BerriAI:main' into ollama-image-handling 2024-05-09 20:25:30 +02:00
Krrish Dholakia
6575143460 feat(proxy_server.py): return litellm version in response headers 2024-05-08 16:00:08 -07:00
frob
b93c00abec
Merge branch 'BerriAI:main' into ollama-image-handling 2024-05-09 00:14:29 +02:00
Ishaan Jaff
2725a55e7a
Merge pull request #3470 from mbektas/fix-ollama-embeddings
support sync ollama embeddings
2024-05-07 19:21:37 -07:00
frob
7a1a3f6411
Merge branch 'BerriAI:main' into ollama-image-handling 2024-05-06 18:06:45 +02:00
Mehmet Bektas
3acad270e5 support sync ollama embeddings 2024-05-05 19:44:25 -07:00
Jack Collins
bb6132eee1 Fix: get format from data not optional_params ollama non-stream completion 2024-05-05 18:59:26 -07:00
Jack Collins
81b1c46c6f Add missing import itertools.chain 2024-05-05 18:54:08 -07:00
Jack Collins
03b82b78c1 Fix: Set finish_reason to tool_calls for non-stream responses in ollama 2024-05-05 18:52:31 -07:00
Jack Collins
297543e3e5 Parse streamed function calls as single delta in ollama 2024-05-05 18:52:20 -07:00
frob
465f491e7f
Merge branch 'BerriAI:main' into ollama-image-handling 2024-05-01 22:29:37 +02:00
Krish Dholakia
0714eb3526
Merge branch 'main' into litellm_ollama_tool_call_reponse 2024-05-01 10:24:05 -07:00
frob
ae87cb3a31
Merge branch 'BerriAI:main' into ollama-image-handling 2024-04-21 01:49:10 +02:00
frob
3df7231fa5
Disable special tokens in ollama completion when counting tokens
Some(?) models (eg, codegemma) don't return a prompt_eval_count field, so ollama.py tries to compute the value based on encoding of the prompt.  Unfortunately FIM symbols used in the prompt (eg, "<|fim_prefix|>") cause the encoder to throw an exception, so we disable special processing.
2024-04-19 21:38:42 +02:00
frob
2492fade3a
Update comment. 2024-04-16 01:12:24 +02:00
frob
ea117fc859
Merge branch 'BerriAI:main' into ollama-image-handling 2024-04-13 21:42:58 +02:00
frob
82a4232dce
ollama also accepts PNG 2024-04-08 03:35:02 +02:00
frob
59ed4fb51e
Update ollama.py for image handling
ollama wants plain base64 jpeg images, and some clients send dataURI and/or webp.  Remove prefixes and convert all non-jpeg images to jpeg.
2024-04-08 03:28:24 +02:00
Gregory Nwosu
559a4cde23
created defaults for response["eval_count"]
there is no way in litellm to disable the cache in ollama that is removing the eval_count response keys from the json.
This PR allows the code to create sensible defaults for when the response is empty
see 
- https://github.com/ollama/ollama/issues/1573
- https://github.com/ollama/ollama/issues/2023
2024-04-08 02:03:54 +01:00
frob
d5c1ae1cb2
Update ollama.py for image handling
Some clients (eg librechat) send images in datauri format, not plain base64.  Strip off the prerix when passing images to ollama.
2024-04-07 13:05:39 +02:00
DaxServer
61b6f8be44 docs: Update references to Ollama repository url
Updated references to the Ollama repository URL from https://github.com/jmorganca/ollama to https://github.com/ollama/ollama.
2024-03-31 19:35:37 +02:00
Krrish Dholakia
48af367885 fix(ollama.py): fix type issue 2024-03-28 15:01:56 -07:00
onukura
f86472518d Add a feature to ollama aembedding to accept batch input 2024-03-27 21:39:19 +00:00
onukura
2df63cc621 Fix ollama embedding response 2024-03-25 16:26:49 +00:00
Lunik
cee20695eb
🐛 fix: Ollama vision models call arguments (like : llava)
Signed-off-by: Lunik <lunik@tiwabbit.fr>
2024-02-26 17:52:55 +01:00
Krrish Dholakia
d1db67890c fix(ollama.py): support format for ollama 2024-02-06 10:11:52 -08:00
Ishaan Jaff
14c9e239a1
Merge pull request #1750 from vanpelt/patch-2
Re-raise exception in async ollama streaming
2024-02-05 08:12:17 -08:00
Krrish Dholakia
312c7462c8 refactor(ollama.py): trigger rebuild 2024-02-03 20:23:43 -08:00
Krrish Dholakia
01cef1fe9e fix(ollama.py): fix api connection error
https://github.com/BerriAI/litellm/issues/1735
2024-02-03 20:22:33 -08:00
Chris Van Pelt
1568b162f5
Re-raise exception in async ollama streaming 2024-02-01 16:14:07 -08:00