llama-stack-mirror/llama_stack
Matthew Farrellee 99bd39cc30 feat: use openai-python for openai inference provider
fixes #2121

this implementation splits reponsibility between litellm and openai libraries -

 | Inference Method           | Implementation Source    |
 |----------------------------|--------------------------|
 | completion                 | LiteLLMOpenAIMixin       |
 | chat_completion            | LiteLLMOpenAIMixin       |
 | embedding                  | LiteLLMOpenAIMixin       |
 | batch_completion           | LiteLLMOpenAIMixin       |
 | batch_chat_completion      | LiteLLMOpenAIMixin       |
 | openai_completion          | AsyncOpenAI              |
 | openai_chat_completion     | AsyncOpenAI              |

test with -

$ OPENAI_API_KEY=$LLAMA_API_KEY OPENAI_BASE_URL=https://api.llama.com/compat/v1 llama stack build --image-type conda --image-name openai --providers inference=remote::openai --run

$ llama-stack-client models register Llama-4-Scout-17B-16E-Instruct-FP8

$ curl "http://localhost:8321/v1/openai/v1/chat/completions" -H "Content-Type: application/json" \
-d '{
      "model": "Llama-4-Scout-17B-16E-Instruct-FP8",
      "messages": [
        {"role": "user", "content": "Hello Llama! Can you give me a quick intro?"}
      ]
}'
{"id":"AmPwrrkc5JgVjejPdIPrpT2","choices":[{"finish_reason":"stop","index":0,"logprobs":{"content":null,"refusal":null},"message":{"content":"Hello! I'm Llama, a Meta-designed model that adapts to your conversational style. Whether you need quick answers, deep dives into ideas, or just want to vent, joke, or brainstorm—I'm here for it. What’s on your mind?","refusal":"","role":"assistant","annotations":null,"audio":null,"function_call":null,"tool_calls":null,"id":"AmPwrrkc5JgVjejPdIPrpT2"}}],"created":1747410061,"model":"Llama-4-Scout-17B-16E-Instruct-FP8","object":"chat.completions","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":54,"prompt_tokens":22,"total_tokens":76,"completion_tokens_details":null,"prompt_tokens_details":null}}
2025-05-16 11:47:02 -04:00
..
apis chore: more API validators (#2165) 2025-05-15 11:22:51 -07:00
cli feat: refactor external providers dir (#2049) 2025-05-15 20:17:03 +02:00
distribution feat: refactor external providers dir (#2049) 2025-05-15 20:17:03 +02:00
models fix: llama4 tool use prompt fix (#2103) 2025-05-06 22:18:31 -07:00
providers feat: use openai-python for openai inference provider 2025-05-16 11:47:02 -04:00
strong_typing chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
templates refactor: rename dev distro as starter (#2181) 2025-05-15 12:52:34 -07:00
ui fix: misc UI changes (#2175) 2025-05-15 13:03:05 -07:00
__init__.py export LibraryClient 2024-12-13 12:08:00 -08:00
env.py refactor(test): move tools, evals, datasetio, scoring and post training tests (#1401) 2025-03-04 14:53:47 -08:00
log.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
schema_utils.py chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00