mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-24 06:53:57 +00:00
feat(vllm): periodically refresh models
This commit is contained in:
parent
68a2dfbad7
commit
1bf710bec0
6 changed files with 95 additions and 13 deletions
|
|
@ -12,11 +12,13 @@ Remote vLLM inference provider for connecting to vLLM servers.
|
|||
| `max_tokens` | `<class 'int'>` | No | 4096 | Maximum number of tokens to generate. |
|
||||
| `api_token` | `str \| None` | No | fake | The API token |
|
||||
| `tls_verify` | `bool \| str` | No | True | Whether to verify TLS certificates. Can be a boolean or a path to a CA certificate file. |
|
||||
| `refresh_models` | `<class 'bool'>` | No | False | Whether to refresh models periodically |
|
||||
| `refresh_models_interval` | `<class 'int'>` | No | 300 | Interval in seconds to refresh models |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
url: ${env.VLLM_URL}
|
||||
url: ${env.VLLM_URL:=}
|
||||
max_tokens: ${env.VLLM_MAX_TOKENS:=4096}
|
||||
api_token: ${env.VLLM_API_TOKEN:=fake}
|
||||
tls_verify: ${env.VLLM_TLS_VERIFY:=true}
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue