llama-stack-mirror/llama_stack
Ben Browning a5827f7cb3 Nvidia provider support for OpenAI API endpoints
This wires up the openai_completion and openai_chat_completion API
methods for the remote Nvidia inference provider, and adds it to the
chat completions part of the OpenAI test suite.

The hosted Nvidia service doesn't actually host any Llama models with
functioning completions and chat completions endpoints, so for now the
test suite only activates the nvidia provider for chat completions.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-10 13:43:28 -04:00
..
apis Add prompt_logprobs and guided_choice to OpenAI completions 2025-04-09 15:47:02 -04:00
cli refactor: move all llama code to models/llama out of meta reference (#1887) 2025-04-07 15:03:58 -07:00
distribution Add prompt_logprobs and guided_choice to OpenAI completions 2025-04-09 15:47:02 -04:00
models fix: Mirror llama4 rope scaling fixes, small model simplify (#1917) 2025-04-09 11:28:45 -07:00
providers Nvidia provider support for OpenAI API endpoints 2025-04-10 13:43:28 -04:00
strong_typing chore: more mypy checks (ollama, vllm, ...) (#1777) 2025-04-01 17:12:39 +02:00
templates docs: Update remote-vllm.md with AMD GPU vLLM server supported. (#1858) 2025-04-08 21:35:32 -07:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore: Remove style tags from log formatter (#1808) 2025-03-27 10:18:21 -04:00
schema_utils.py chore: make mypy happy with webmethod (#1758) 2025-03-22 08:17:23 -07:00