Llama2 - Huggingface Tutorial

Huggingface is an open source platform to deploy machine-learnings models.

Call Llama2 with Huggingface Inference Endpoints

LiteLLM makes it easy to call your public, private or the default huggingface endpoints.

In this case, let's try and call 3 models:

deepset/deberta-v3-large-squad2: calls the default huggingface endpoint
meta-llama/Llama-2-7b-hf: calls a public endpoint
meta-llama/Llama-2-7b-chat-hf: call your privat endpoint

Case 1: Call default huggingface endpoint

Here's the complete example:

from litellm import completion 

model = "deepset/deberta-v3-large-squad2"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface")

What's happening?

model - this is the name of the deployed model on huggingface
messages - this is the input. We accept the OpenAI chat format. For huggingface, by default we iterate through the list and add the message["content"] to the prompt.

Case 2: Call Llama2 public endpoint

We've deployed meta-llama/Llama-2-7b-hf behind a public endpoint - https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud.

Let's try it out:

from litellm import completion 

model = "meta-llama/Llama-2-7b-hf"
messages = [{"role": "user", "content": "Hey, how's it going?"}] # LiteLLM follows the OpenAI format 
custom_api_base = "https://ag3dkq4zui5nu8g3.us-east-1.aws.endpoints.huggingface.cloud"

### CALLING ENDPOINT
completion(model=model, messages=messages, custom_llm_provider="huggingface", custom_api_base=custom_api_base)

1.7 KiB Raw Blame History

Llama2 - Huggingface Tutorial

Call Llama2 with Huggingface Inference Endpoints

Case 1: Call default huggingface endpoint

Case 2: Call Llama2 public endpoint

1.7 KiB

Raw Blame History