llama-stack

phoenix-oss/llama-stack

Fork 0

forked from phoenix-oss/llama-stack-mirror

Commit graph

Author	SHA1	Message	Date
Celina Hanouti	736092f6bc	[Inference] Use huggingface_hub inference client for TGI adapter (#53 ) * Use huggingface_hub inference client for TGI inference * Update the default value for TGI URL * Use InferenceClient.text_generation for TGI inference * Fixes post-review and split TGI adapter into local and Inference Endpoints ones * Update CLI reference and add typing * Rename TGI Adapter class * Use HfApi to get the namespace when not provide in the hf endpoint name * Remove unecessary method argument * Improve TGI adapter initialization condition * Move helper into impl file + fix merging conflicts	2024-09-12 09:11:35 -07:00
Ashwin Bharambe	21bedc1596	[inference] Add a TGI adapter (#52 ) * TGI adapter and some refactoring of other inference adapters * Use the lower-level `generate_stream()` method for correct tool calling --------- Co-authored-by: Ashwin Bharambe <ashwin@meta.com>	2024-09-04 22:49:33 -07:00

Author

SHA1

Message

Date

Celina Hanouti

736092f6bc

[Inference] Use huggingface_hub inference client for TGI adapter (#53 )

* Use huggingface_hub inference client for TGI inference

* Update the default value for TGI URL

* Use InferenceClient.text_generation for TGI inference

* Fixes post-review and split TGI adapter into local and Inference Endpoints ones

* Update CLI reference and add typing

* Rename TGI Adapter class

* Use HfApi to get the namespace when not provide in the hf endpoint name

* Remove unecessary method argument

* Improve TGI adapter initialization condition

* Move helper into impl file + fix merging conflicts

2024-09-12 09:11:35 -07:00

Ashwin Bharambe

21bedc1596

[inference] Add a TGI adapter (#52 )

* TGI adapter and some refactoring of other inference adapters

* Use the lower-level `generate_stream()` method for correct tool calling

---------

Co-authored-by: Ashwin Bharambe <ashwin@meta.com>

2024-09-04 22:49:33 -07:00

2 commits