llama-stack-mirror/llama_stack/providers/remote/inference
Matthew Farrellee 99bd39cc30 feat: use openai-python for openai inference provider
fixes #2121

this implementation splits reponsibility between litellm and openai libraries -

 | Inference Method           | Implementation Source    |
 |----------------------------|--------------------------|
 | completion                 | LiteLLMOpenAIMixin       |
 | chat_completion            | LiteLLMOpenAIMixin       |
 | embedding                  | LiteLLMOpenAIMixin       |
 | batch_completion           | LiteLLMOpenAIMixin       |
 | batch_chat_completion      | LiteLLMOpenAIMixin       |
 | openai_completion          | AsyncOpenAI              |
 | openai_chat_completion     | AsyncOpenAI              |

test with -

$ OPENAI_API_KEY=$LLAMA_API_KEY OPENAI_BASE_URL=https://api.llama.com/compat/v1 llama stack build --image-type conda --image-name openai --providers inference=remote::openai --run

$ llama-stack-client models register Llama-4-Scout-17B-16E-Instruct-FP8

$ curl "http://localhost:8321/v1/openai/v1/chat/completions" -H "Content-Type: application/json" \
-d '{
      "model": "Llama-4-Scout-17B-16E-Instruct-FP8",
      "messages": [
        {"role": "user", "content": "Hello Llama! Can you give me a quick intro?"}
      ]
}'
{"id":"AmPwrrkc5JgVjejPdIPrpT2","choices":[{"finish_reason":"stop","index":0,"logprobs":{"content":null,"refusal":null},"message":{"content":"Hello! I'm Llama, a Meta-designed model that adapts to your conversational style. Whether you need quick answers, deep dives into ideas, or just want to vent, joke, or brainstorm—I'm here for it. What’s on your mind?","refusal":"","role":"assistant","annotations":null,"audio":null,"function_call":null,"tool_calls":null,"id":"AmPwrrkc5JgVjejPdIPrpT2"}}],"created":1747410061,"model":"Llama-4-Scout-17B-16E-Instruct-FP8","object":"chat.completions","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":54,"prompt_tokens":22,"total_tokens":76,"completion_tokens_details":null,"prompt_tokens_details":null}}
2025-05-16 11:47:02 -04:00
..
anthropic chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
bedrock chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
cerebras chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
cerebras_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
databricks chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
fireworks chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
fireworks_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
gemini chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
groq chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
groq_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
llama_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
nvidia chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
ollama fix: ollama openai completion and chat completion params (#2125) 2025-05-12 10:57:53 -07:00
openai feat: use openai-python for openai inference provider 2025-05-16 11:47:02 -04:00
passthrough chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
runpod chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
sambanova feat(providers): sambanova updated to use LiteLLM openai-compat (#1596) 2025-05-06 16:50:22 -07:00
sambanova_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
tgi chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
together fix: revert "feat(provider): adding llama4 support in together inference provider (#2123)" (#2124) 2025-05-08 15:18:16 -07:00
together_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
vllm fix: multiple tool calls in remote-vllm chat_completion (#2161) 2025-05-15 11:23:29 -07:00
watsonx chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00