litellm-mirror/litellm/llms/triton/completion/handler.py at a75ff7cc5438dc45e11d0b19c47ecbbb2464cc11 - phoenix/litellm-mirror - Git for kvant.cloud

phoenix/litellm-mirror

mirror of https://github.com/BerriAI/litellm.git synced 2025-04-26 19:24:27 +00:00

Ishaan Jaff 6107f9f3f3

[Bug fix ]: Triton /infer handler incompatible with batch responses (#7337 )

* migrate triton to base llm http handler

* clean up triton handler.py

* use transform functions for triton

* add TritonConfig

* get openai params for triton

* use triton embedding config

* test_completion_triton_generate_api

* test_completion_triton_infer_api

* fix TritonConfig doc string

* use TritonResponseIterator

* fix triton embeddings

* docs triton chat usage

2024-12-20 20:59:40 -08:00

5 lines

145 B

Python

Raw Blame History

 """
 Triton Completion - uses `llm_http_handler.py` to make httpx requests
 Request/Response transformation is handled in `transformation.py`
 """