mirror of https://github.com/meta-llama/llama-stack.git synced 2025-07-20 03:40:05 +00:00

Ashwin Bharambe ade075152e

Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.

See Discord discussion:
1395849853

2025-07-18 15:52:18 -07:00

1.4 KiB

Raw Blame History

Inference Providers

This section contains documentation for all available providers for the inference API.

inline::meta-reference
inline::sentence-transformers
remote::anthropic
remote::bedrock
remote::cerebras
remote::cerebras-openai-compat
remote::databricks
remote::fireworks
remote::fireworks-openai-compat
remote::gemini
remote::groq
remote::groq-openai-compat
remote::hf::endpoint
remote::hf::serverless
remote::llama-openai-compat
remote::nvidia
remote::ollama
remote::openai
remote::passthrough
remote::runpod
remote::sambanova
remote::sambanova-openai-compat
remote::tgi
remote::together
remote::together-openai-compat
remote::vllm
remote::watsonx

1.4 KiB Raw Blame History

Inference Providers

1.4 KiB

Raw Blame History