litellm-mirror/litellm/llms/triton/completion/handler.py
Ishaan Jaff 6107f9f3f3
[Bug fix ]: Triton /infer handler incompatible with batch responses (#7337)
* migrate triton to base llm http handler

* clean up triton handler.py

* use transform functions for triton

* add TritonConfig

* get openai params for triton

* use triton embedding config

* test_completion_triton_generate_api

* test_completion_triton_infer_api

* fix TritonConfig doc string

* use TritonResponseIterator

* fix triton embeddings

* docs triton chat usage
2024-12-20 20:59:40 -08:00

5 lines
145 B
Python

"""
Triton Completion - uses `llm_http_handler.py` to make httpx requests
Request/Response transformation is handled in `transformation.py`
"""