[Inference] Use huggingface_hub inference client for TGI adapter (#53)

* Use huggingface_hub inference client for TGI inference

* Update the default value for TGI URL

* Use InferenceClient.text_generation for TGI inference

* Fixes post-review and split TGI adapter into local and Inference Endpoints ones

* Update CLI reference and add typing

* Rename TGI Adapter class

* Use HfApi to get the namespace when not provide in the hf endpoint name

* Remove unecessary method argument

* Improve TGI adapter initialization condition

* Move helper into impl file + fix merging conflicts
This commit is contained in:
Celina Hanouti 2024-09-12 18:11:35 +02:00 committed by GitHub
parent 191cd28831
commit 736092f6bc
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
6 changed files with 171 additions and 72 deletions

View file

@ -286,6 +286,13 @@ i+-------------------------------+---------------------------------------+------
| | "memory": "meta-reference-faiss" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
| local-plus-tgi-inference | { | Use TGI (local or with [Hugging Face Inference Endpoints](https:// |
| | "inference": "remote::tgi", | huggingface.co/inference-endpoints/dedicated)) for running LLM |
| | "safety": "meta-reference", | inference. When using HF Inference Endpoints, you must provide the |
| | "agentic_system": "meta-reference", | name of the endpoint. |
| | "memory": "meta-reference-faiss" | |
| | } | |
+--------------------------------+---------------------------------------+----------------------------------------------------------------------+
</pre>
As you can see above, each “distribution” details the “providers” it is composed of. For example, `local` uses the “meta-reference” provider for inference while local-ollama relies on a different provider (Ollama) for inference. Similarly, you can use Fireworks or Together.AI for running inference as well.