llama-stack-mirror/docs/source/providers/inference/remote_hf_serverless.md
Francisco Javier Arceo c8d41d45ec chore: Enabling Milvus for VectorIO CI
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-06-30 11:55:49 -04:00

673 B

remote::hf::serverless

Description

HuggingFace Inference API serverless provider for on-demand model inference.

Configuration

Field Type Required Default Description
huggingface_repo <class 'str'> No PydanticUndefined The model ID of the model on the Hugging Face Hub (e.g. 'meta-llama/Meta-Llama-3.1-70B-Instruct')
api_token pydantic.types.SecretStr | None No Your Hugging Face user access token (will default to locally saved token if not provided)

Sample Configuration

huggingface_repo: ${env.INFERENCE_MODEL}
api_token: ${env.HF_API_TOKEN}