llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Matthew Farrellee 64f8d4c3ad feat: use openai-python for openai inference provider (#2193 ) # What does this PR do? fixes #2121 this implementation splits reponsibility between litellm and openai libraries - \| Inference Method \| Implementation Source \| \|----------------------------\|--------------------------\| \| completion \| LiteLLMOpenAIMixin \| \| chat_completion \| LiteLLMOpenAIMixin \| \| embedding \| LiteLLMOpenAIMixin \| \| batch_completion \| LiteLLMOpenAIMixin \| \| batch_chat_completion \| LiteLLMOpenAIMixin \| \| openai_completion \| AsyncOpenAI \| \| openai_chat_completion \| AsyncOpenAI \| ## Test Plan smoke test with - ``` $ OPENAI_API_KEY=$LLAMA_API_KEY OPENAI_BASE_URL=https://api.llama.com/compat/v1 llama stack build --image-type conda --image-name openai --providers inference=remote::openai --run $ llama-stack-client models register Llama-4-Scout-17B-16E-Instruct-FP8 $ curl "http://localhost:8321/v1/openai/v1/chat/completions" -H "Content-Type: application/json" \ -d '{ "model": "Llama-4-Scout-17B-16E-Instruct-FP8", "messages": [ {"role": "user", "content": "Hello Llama! Can you give me a quick intro?"} ] }' {"id":"AmPwrrkc5JgVjejPdIPrpT2","choices":[{"finish_reason":"stop","index":0,"logprobs":{"content":null,"refusal":null},"message":{"content":"Hello! I'm Llama, a Meta-designed model that adapts to your conversational style. Whether you need quick answers, deep dives into ideas, or just want to vent, joke, or brainstorm—I'm here for it. What’s on your mind?","refusal":"","role":"assistant","annotations":null,"audio":null,"function_call":null,"tool_calls":null,"id":"AmPwrrkc5JgVjejPdIPrpT2"}}],"created":1747410061,"model":"Llama-4-Scout-17B-16E-Instruct-FP8","object":"chat.completions","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":54,"prompt_tokens":22,"total_tokens":76,"completion_tokens_details":null,"prompt_tokens_details":null}} ``` and run full test suite.		2025-05-16 12:57:56 -07:00
..
anthropic	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
bedrock	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
cerebras	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
cerebras_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
databricks	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
fireworks	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
fireworks_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
gemini	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
groq	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
groq_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
llama_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
nvidia	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
ollama	fix: ollama openai completion and chat completion params (#2125 )	2025-05-12 10:57:53 -07:00
openai	feat: use openai-python for openai inference provider (#2193 )	2025-05-16 12:57:56 -07:00
passthrough	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
runpod	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
sambanova	feat(providers): sambanova updated to use LiteLLM openai-compat (#1596 )	2025-05-06 16:50:22 -07:00
sambanova_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
tgi	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
together	fix: revert "feat(provider): adding llama4 support in together inference provider (#2123 )" (#2124 )	2025-05-08 15:18:16 -07:00
together_openai_compat	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
vllm	fix: multiple tool calls in remote-vllm chat_completion (#2161 )	2025-05-15 11:23:29 -07:00
watsonx	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00
__init__.py	`impls` -> `inline`, `adapters` -> `remote` (#381 )	2024-11-06 14:54:05 -08:00