mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-22 00:13:08 +00:00
# What does this PR do? inference adapters can now configure `refresh_models: bool` to control periodic model listing from their providers BREAKING CHANGE: together inference adapter default changed. previously always refreshed, now follows config. addresses "models: refresh" on #3517 ## Test Plan ci w/ new tests
25 lines
767 B
Text
25 lines
767 B
Text
---
|
|
description: "Text Generation Inference (TGI) provider for HuggingFace model serving."
|
|
sidebar_label: Remote - Tgi
|
|
title: remote::tgi
|
|
---
|
|
|
|
# remote::tgi
|
|
|
|
## Description
|
|
|
|
Text Generation Inference (TGI) provider for HuggingFace model serving.
|
|
|
|
## Configuration
|
|
|
|
| Field | Type | Required | Default | Description |
|
|
|-------|------|----------|---------|-------------|
|
|
| `allowed_models` | `list[str \| None` | No | | List of models that should be registered with the model registry. If None, all models are allowed. |
|
|
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically from the provider |
|
|
| `url` | `<class 'str'>` | No | | The URL for the TGI serving endpoint |
|
|
|
|
## Sample Configuration
|
|
|
|
```yaml
|
|
url: ${env.TGI_URL:=}
|
|
```
|