llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 01:48:05 +00:00

History

Matthew Farrellee 64f8d4c3ad feat: use openai-python for openai inference provider (#2193 ) # What does this PR do? fixes #2121 this implementation splits reponsibility between litellm and openai libraries - \| Inference Method \| Implementation Source \| \|----------------------------\|--------------------------\| \| completion \| LiteLLMOpenAIMixin \| \| chat_completion \| LiteLLMOpenAIMixin \| \| embedding \| LiteLLMOpenAIMixin \| \| batch_completion \| LiteLLMOpenAIMixin \| \| batch_chat_completion \| LiteLLMOpenAIMixin \| \| openai_completion \| AsyncOpenAI \| \| openai_chat_completion \| AsyncOpenAI \| ## Test Plan smoke test with - ``` $ OPENAI_API_KEY=$LLAMA_API_KEY OPENAI_BASE_URL=https://api.llama.com/compat/v1 llama stack build --image-type conda --image-name openai --providers inference=remote::openai --run $ llama-stack-client models register Llama-4-Scout-17B-16E-Instruct-FP8 $ curl "http://localhost:8321/v1/openai/v1/chat/completions" -H "Content-Type: application/json" \ -d '{ "model": "Llama-4-Scout-17B-16E-Instruct-FP8", "messages": [ {"role": "user", "content": "Hello Llama! Can you give me a quick intro?"} ] }' {"id":"AmPwrrkc5JgVjejPdIPrpT2","choices":[{"finish_reason":"stop","index":0,"logprobs":{"content":null,"refusal":null},"message":{"content":"Hello! I'm Llama, a Meta-designed model that adapts to your conversational style. Whether you need quick answers, deep dives into ideas, or just want to vent, joke, or brainstorm—I'm here for it. What’s on your mind?","refusal":"","role":"assistant","annotations":null,"audio":null,"function_call":null,"tool_calls":null,"id":"AmPwrrkc5JgVjejPdIPrpT2"}}],"created":1747410061,"model":"Llama-4-Scout-17B-16E-Instruct-FP8","object":"chat.completions","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":54,"prompt_tokens":22,"total_tokens":76,"completion_tokens_details":null,"prompt_tokens_details":null}} ``` and run full test suite.		2025-05-16 12:57:56 -07:00
..
inline	feat: Adding support for customizing chunk context in RAG insertion and querying (#2134 )	2025-05-14 21:56:20 -04:00
registry	feat(providers): sambanova updated to use LiteLLM openai-compat (#1596 )	2025-05-06 16:50:22 -07:00
remote	feat: use openai-python for openai inference provider (#2193 )	2025-05-16 12:57:56 -07:00
utils	fix: multiple tool calls in remote-vllm chat_completion (#2161 )	2025-05-15 11:23:29 -07:00
__init__.py	API Updates (#73 )	2024-09-17 19:51:35 -07:00
datatypes.py	chore: enable pyupgrade fixes (#1806 )	2025-05-01 14:23:50 -07:00