llama-stack-mirror/llama_stack/providers/remote/inference
Matthew Farrellee 477bcd4d09
feat: allow dynamic model registration for nvidia inference provider (#2726)
# What does this PR do?

let's users register models available at
https://integrate.api.nvidia.com/v1/models that isn't already in
llama_stack/providers/remote/inference/nvidia/models.py

## Test Plan

1. run the nvidia distro
2. register a model from https://integrate.api.nvidia.com/v1/models that
isn't already know, as of this writing
nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example
3. perform inference w/ the model
2025-07-17 12:11:30 -07:00
..
anthropic ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
bedrock ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
cerebras ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
cerebras_openai_compat feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
databricks ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
fireworks ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
fireworks_openai_compat feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
gemini ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
groq fix: Don't cache clients for passthrough auth providers (#2728) 2025-07-11 13:38:27 -07:00
groq_openai_compat feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
llama_openai_compat feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745) 2025-07-16 12:49:38 -04:00
nvidia feat: allow dynamic model registration for nvidia inference provider (#2726) 2025-07-17 12:11:30 -07:00
ollama fix: Safety in starter (#2731) 2025-07-14 15:07:40 -07:00
openai feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745) 2025-07-16 12:49:38 -04:00
passthrough feat: consolidate most distros into "starter" (#2516) 2025-07-04 15:58:03 +02:00
runpod ci: test safety with starter (#2628) 2025-07-09 16:53:50 +02:00
sambanova fix: sambanova shields and model validation (#2693) 2025-07-11 16:29:15 -04:00
sambanova_openai_compat feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
tgi feat: consolidate most distros into "starter" (#2516) 2025-07-04 15:58:03 +02:00
together fix: Don't cache clients for passthrough auth providers (#2728) 2025-07-11 13:38:27 -07:00
together_openai_compat feat: introduce APIs for retrieving chat completion requests (#2145) 2025-05-18 21:43:19 -07:00
vllm refactor(env)!: enhanced environment variable substitution (#2490) 2025-06-26 08:20:08 +05:30
watsonx fix: allow default empty vars for conditionals (#2570) 2025-07-01 14:42:05 +02:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00