mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-20 03:40:05 +00:00
Inline _inference_ providers haven't proved to be very useful -- they
are rarely used. And for good reason -- it is almost never a good idea
to include a complex (distributed) inference engine bundled into a
distributed stateful front-end server serving many other things.
Responsibility should be split properly.
See Discord discussion:
1395849853
1.4 KiB
1.4 KiB
Inference Providers
This section contains documentation for all available providers for the inference API.
- inline::meta-reference
- inline::sentence-transformers
- remote::anthropic
- remote::bedrock
- remote::cerebras
- remote::cerebras-openai-compat
- remote::databricks
- remote::fireworks
- remote::fireworks-openai-compat
- remote::gemini
- remote::groq
- remote::groq-openai-compat
- remote::hf::endpoint
- remote::hf::serverless
- remote::llama-openai-compat
- remote::nvidia
- remote::ollama
- remote::openai
- remote::passthrough
- remote::runpod
- remote::sambanova
- remote::sambanova-openai-compat
- remote::tgi
- remote::together
- remote::together-openai-compat
- remote::vllm
- remote::watsonx