llama-stack/llama_stack/providers/remote/inference
Matthew Farrellee 64f8d4c3ad
feat: use openai-python for openai inference provider (#2193)
# What does this PR do?

fixes #2121

this implementation splits reponsibility between litellm and openai
libraries -

 | Inference Method           | Implementation Source    |
 |----------------------------|--------------------------|
 | completion                 | LiteLLMOpenAIMixin       |
 | chat_completion            | LiteLLMOpenAIMixin       |
 | embedding                  | LiteLLMOpenAIMixin       |
 | batch_completion           | LiteLLMOpenAIMixin       |
 | batch_chat_completion      | LiteLLMOpenAIMixin       |
 | openai_completion          | AsyncOpenAI              |
 | openai_chat_completion     | AsyncOpenAI              |

## Test Plan

smoke test with -
```
$ OPENAI_API_KEY=$LLAMA_API_KEY OPENAI_BASE_URL=https://api.llama.com/compat/v1 llama stack build --image-type conda --image-name openai --providers inference=remote::openai --run

$ llama-stack-client models register Llama-4-Scout-17B-16E-Instruct-FP8

$ curl "http://localhost:8321/v1/openai/v1/chat/completions" -H "Content-Type: application/json" \ -d '{
      "model": "Llama-4-Scout-17B-16E-Instruct-FP8",
      "messages": [
        {"role": "user", "content": "Hello Llama! Can you give me a quick intro?"}
      ]
}'
{"id":"AmPwrrkc5JgVjejPdIPrpT2","choices":[{"finish_reason":"stop","index":0,"logprobs":{"content":null,"refusal":null},"message":{"content":"Hello! I'm Llama, a Meta-designed model that adapts to your conversational style. Whether you need quick answers, deep dives into ideas, or just want to vent, joke, or brainstorm—I'm here for it. What’s on your mind?","refusal":"","role":"assistant","annotations":null,"audio":null,"function_call":null,"tool_calls":null,"id":"AmPwrrkc5JgVjejPdIPrpT2"}}],"created":1747410061,"model":"Llama-4-Scout-17B-16E-Instruct-FP8","object":"chat.completions","service_tier":null,"system_fingerprint":null,"usage":{"completion_tokens":54,"prompt_tokens":22,"total_tokens":76,"completion_tokens_details":null,"prompt_tokens_details":null}}
```

and run full test suite.
2025-05-16 12:57:56 -07:00
..
anthropic chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
bedrock chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
cerebras chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
cerebras_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
databricks chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
fireworks chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
fireworks_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
gemini chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
groq chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
groq_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
llama_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
nvidia chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
ollama fix: ollama openai completion and chat completion params (#2125) 2025-05-12 10:57:53 -07:00
openai feat: use openai-python for openai inference provider (#2193) 2025-05-16 12:57:56 -07:00
passthrough chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
runpod chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
sambanova feat(providers): sambanova updated to use LiteLLM openai-compat (#1596) 2025-05-06 16:50:22 -07:00
sambanova_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
tgi chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
together fix: revert "feat(provider): adding llama4 support in together inference provider (#2123)" (#2124) 2025-05-08 15:18:16 -07:00
together_openai_compat chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
vllm fix: multiple tool calls in remote-vllm chat_completion (#2161) 2025-05-15 11:23:29 -07:00
watsonx chore: enable pyupgrade fixes (#1806) 2025-05-01 14:23:50 -07:00
__init__.py impls -> inline, adapters -> remote (#381) 2024-11-06 14:54:05 -08:00