mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-04 18:13:44 +00:00
docs: concepts and building_applications migration (#3534)
# What does this PR do? - Migrates the remaining documentation sections to the new documentation format <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Partial migration <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* -->
This commit is contained in:
parent
05ff4c4420
commit
c71ce8df61
82 changed files with 2535 additions and 1237 deletions
232
docs/docs/distributions/self_hosted_distro/starter.md
Normal file
232
docs/docs/distributions/self_hosted_distro/starter.md
Normal file
|
|
@ -0,0 +1,232 @@
|
|||
---
|
||||
orphan: true
|
||||
---
|
||||
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
|
||||
# Starter Distribution
|
||||
|
||||
```{toctree}
|
||||
:maxdepth: 2
|
||||
:hidden:
|
||||
|
||||
self
|
||||
```
|
||||
|
||||
The `llamastack/distribution-starter` distribution is a comprehensive, multi-provider distribution that includes most of the available inference providers in Llama Stack. It's designed to be a one-stop solution for developers who want to experiment with different AI providers without having to configure each one individually.
|
||||
|
||||
## Provider Composition
|
||||
|
||||
The starter distribution consists of the following provider configurations:
|
||||
|
||||
| API | Provider(s) |
|
||||
|-----|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| agents | `inline::meta-reference` |
|
||||
| datasetio | `remote::huggingface`, `inline::localfs` |
|
||||
| eval | `inline::meta-reference` |
|
||||
| files | `inline::localfs` |
|
||||
| inference | `remote::openai`, `remote::fireworks`, `remote::together`, `remote::ollama`, `remote::anthropic`, `remote::gemini`, `remote::groq`, `remote::sambanova`, `remote::vllm`, `remote::tgi`, `remote::cerebras`, `remote::llama-openai-compat`, `remote::nvidia`, `remote::hf::serverless`, `remote::hf::endpoint`, `inline::sentence-transformers` |
|
||||
| safety | `inline::llama-guard` |
|
||||
| scoring | `inline::basic`, `inline::llm-as-judge`, `inline::braintrust` |
|
||||
| telemetry | `inline::meta-reference` |
|
||||
| tool_runtime | `remote::brave-search`, `remote::tavily-search`, `inline::rag-runtime`, `remote::model-context-protocol` |
|
||||
| vector_io | `inline::faiss`, `inline::sqlite-vec`, `inline::milvus`, `remote::chromadb`, `remote::pgvector` |
|
||||
|
||||
## Inference Providers
|
||||
|
||||
The starter distribution includes a comprehensive set of inference providers:
|
||||
|
||||
### Hosted Providers
|
||||
- **[OpenAI](https://openai.com/api/)**: GPT-4, GPT-3.5, O1, O3, O4 models and text embeddings -
|
||||
provider ID: `openai` - reference documentation: [openai](../../providers/inference/remote_openai.md)
|
||||
- **[Fireworks](https://fireworks.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and
|
||||
embeddings - provider ID: `fireworks` - reference documentation: [fireworks](../../providers/inference/remote_fireworks.md)
|
||||
- **[Together](https://together.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models and
|
||||
embeddings - provider ID: `together` - reference documentation: [together](../../providers/inference/remote_together.md)
|
||||
- **[Anthropic](https://www.anthropic.com/)**: Claude 3.5 Sonnet, Claude 3.7 Sonnet, Claude 3.5 Haiku, and Voyage embeddings - provider ID: `anthropic` - reference documentation: [anthropic](../../providers/inference/remote_anthropic.md)
|
||||
- **[Gemini](https://gemini.google.com/)**: Gemini 1.5, 2.0, 2.5 models and text embeddings - provider ID: `gemini` - reference documentation: [gemini](../../providers/inference/remote_gemini.md)
|
||||
- **[Groq](https://groq.com/)**: Fast Llama models (3.1, 3.2, 3.3, 4 Scout, 4 Maverick) - provider ID: `groq` - reference documentation: [groq](../../providers/inference/remote_groq.md)
|
||||
- **[SambaNova](https://www.sambanova.ai/)**: Llama 3.1, 3.2, 3.3, 4 Scout, 4 Maverick models - provider ID: `sambanova` - reference documentation: [sambanova](../../providers/inference/remote_sambanova.md)
|
||||
- **[Cerebras](https://www.cerebras.ai/)**: Cerebras AI models - provider ID: `cerebras` - reference documentation: [cerebras](../../providers/inference/remote_cerebras.md)
|
||||
- **[NVIDIA](https://www.nvidia.com/)**: NVIDIA NIM - provider ID: `nvidia` - reference documentation: [nvidia](../../providers/inference/remote_nvidia.md)
|
||||
- **[HuggingFace](https://huggingface.co/)**: Serverless and endpoint models - provider ID: `hf::serverless` and `hf::endpoint` - reference documentation: [huggingface-serverless](../../providers/inference/remote_hf_serverless.md) and [huggingface-endpoint](../../providers/inference/remote_hf_endpoint.md)
|
||||
- **[Bedrock](https://aws.amazon.com/bedrock/)**: AWS Bedrock models - provider ID: `bedrock` - reference documentation: [bedrock](../../providers/inference/remote_bedrock.md)
|
||||
|
||||
### Local/Remote Providers
|
||||
- **[Ollama](https://ollama.ai/)**: Local Ollama models - provider ID: `ollama` - reference documentation: [ollama](../../providers/inference/remote_ollama.md)
|
||||
- **[vLLM](https://docs.vllm.ai/en/latest/)**: Local or remote vLLM server - provider ID: `vllm` - reference documentation: [vllm](../../providers/inference/remote_vllm.md)
|
||||
- **[TGI](https://github.com/huggingface/text-generation-inference)**: Text Generation Inference server - Dell Enterprise Hub's custom TGI container too (use `DEH_URL`) - provider ID: `tgi` - reference documentation: [tgi](../../providers/inference/remote_tgi.md)
|
||||
- **[Sentence Transformers](https://www.sbert.net/)**: Local embedding models - provider ID: `sentence-transformers` - reference documentation: [sentence-transformers](../../providers/inference/inline_sentence-transformers.md)
|
||||
|
||||
All providers are disabled by default. So you need to enable them by setting the environment variables.
|
||||
|
||||
## Vector IO
|
||||
|
||||
The starter distribution includes a comprehensive set of vector IO providers:
|
||||
|
||||
- **[FAISS](https://github.com/facebookresearch/faiss)**: Local FAISS vector store - enabled by
|
||||
default - provider ID: `faiss`
|
||||
- **[SQLite](https://www.sqlite.org/index.html)**: Local SQLite vector store - disabled by default - provider ID: `sqlite-vec`
|
||||
- **[ChromaDB](https://www.trychroma.com/)**: Remote ChromaDB vector store - disabled by default - provider ID: `chromadb`
|
||||
- **[PGVector](https://github.com/pgvector/pgvector)**: PostgreSQL vector store - disabled by default - provider ID: `pgvector`
|
||||
- **[Milvus](https://milvus.io/)**: Milvus vector store - disabled by default - provider ID: `milvus`
|
||||
|
||||
## Environment Variables
|
||||
|
||||
The following environment variables can be configured:
|
||||
|
||||
### Server Configuration
|
||||
- `LLAMA_STACK_PORT`: Port for the Llama Stack distribution server (default: `8321`)
|
||||
|
||||
### API Keys for Hosted Providers
|
||||
- `OPENAI_API_KEY`: OpenAI API key
|
||||
- `FIREWORKS_API_KEY`: Fireworks API key
|
||||
- `TOGETHER_API_KEY`: Together API key
|
||||
- `ANTHROPIC_API_KEY`: Anthropic API key
|
||||
- `GEMINI_API_KEY`: Google Gemini API key
|
||||
- `GROQ_API_KEY`: Groq API key
|
||||
- `SAMBANOVA_API_KEY`: SambaNova API key
|
||||
- `CEREBRAS_API_KEY`: Cerebras API key
|
||||
- `LLAMA_API_KEY`: Llama API key
|
||||
- `NVIDIA_API_KEY`: NVIDIA API key
|
||||
- `HF_API_TOKEN`: HuggingFace API token
|
||||
|
||||
### Local Provider Configuration
|
||||
- `OLLAMA_URL`: Ollama server URL (default: `http://localhost:11434`)
|
||||
- `VLLM_URL`: vLLM server URL (default: `http://localhost:8000/v1`)
|
||||
- `VLLM_MAX_TOKENS`: vLLM max tokens (default: `4096`)
|
||||
- `VLLM_API_TOKEN`: vLLM API token (default: `fake`)
|
||||
- `VLLM_TLS_VERIFY`: vLLM TLS verification (default: `true`)
|
||||
- `TGI_URL`: TGI server URL
|
||||
|
||||
### Model Configuration
|
||||
- `INFERENCE_MODEL`: HuggingFace model for serverless inference
|
||||
- `INFERENCE_ENDPOINT_NAME`: HuggingFace endpoint name
|
||||
|
||||
### Vector Database Configuration
|
||||
- `SQLITE_STORE_DIR`: SQLite store directory (default: `~/.llama/distributions/starter`)
|
||||
- `ENABLE_SQLITE_VEC`: Enable SQLite vector provider
|
||||
- `ENABLE_CHROMADB`: Enable ChromaDB provider
|
||||
- `ENABLE_PGVECTOR`: Enable PGVector provider
|
||||
- `CHROMADB_URL`: ChromaDB server URL
|
||||
- `PGVECTOR_HOST`: PGVector host (default: `localhost`)
|
||||
- `PGVECTOR_PORT`: PGVector port (default: `5432`)
|
||||
- `PGVECTOR_DB`: PGVector database name
|
||||
- `PGVECTOR_USER`: PGVector username
|
||||
- `PGVECTOR_PASSWORD`: PGVector password
|
||||
|
||||
### Tool Configuration
|
||||
- `BRAVE_SEARCH_API_KEY`: Brave Search API key
|
||||
- `TAVILY_SEARCH_API_KEY`: Tavily Search API key
|
||||
|
||||
### Telemetry Configuration
|
||||
- `OTEL_SERVICE_NAME`: OpenTelemetry service name
|
||||
- `TELEMETRY_SINKS`: Telemetry sinks (default: `console,sqlite`)
|
||||
|
||||
## Enabling Providers
|
||||
|
||||
You can enable specific providers by setting appropriate environment variables. For example,
|
||||
|
||||
```bash
|
||||
# self-hosted
|
||||
export OLLAMA_URL=http://localhost:11434 # enables the Ollama inference provider
|
||||
export VLLM_URL=http://localhost:8000/v1 # enables the vLLM inference provider
|
||||
export TGI_URL=http://localhost:8000/v1 # enables the TGI inference provider
|
||||
|
||||
# cloud-hosted requiring API key configuration on the server
|
||||
export CEREBRAS_API_KEY=your_cerebras_api_key # enables the Cerebras inference provider
|
||||
export NVIDIA_API_KEY=your_nvidia_api_key # enables the NVIDIA inference provider
|
||||
|
||||
# vector providers
|
||||
export MILVUS_URL=http://localhost:19530 # enables the Milvus vector provider
|
||||
export CHROMADB_URL=http://localhost:8000/v1 # enables the ChromaDB vector provider
|
||||
export PGVECTOR_DB=llama_stack_db # enables the PGVector vector provider
|
||||
```
|
||||
|
||||
This distribution comes with a default "llama-guard" shield that can be enabled by setting the `SAFETY_MODEL` environment variable to point to an appropriate Llama Guard model id. Use `llama-stack-client models list` to see the list of available models.
|
||||
|
||||
## Running the Distribution
|
||||
|
||||
You can run the starter distribution via Docker or venv.
|
||||
|
||||
### Via Docker
|
||||
|
||||
This method allows you to get started quickly without having to build the distribution code.
|
||||
|
||||
```bash
|
||||
LLAMA_STACK_PORT=8321
|
||||
docker run \
|
||||
-it \
|
||||
--pull always \
|
||||
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
|
||||
-e OPENAI_API_KEY=your_openai_key \
|
||||
-e FIREWORKS_API_KEY=your_fireworks_key \
|
||||
-e TOGETHER_API_KEY=your_together_key \
|
||||
llamastack/distribution-starter \
|
||||
--port $LLAMA_STACK_PORT
|
||||
```
|
||||
|
||||
### Via venv
|
||||
|
||||
Ensure you have configured the starter distribution using the environment variables explained above.
|
||||
|
||||
```bash
|
||||
uv run --with llama-stack llama stack build --distro starter --image-type venv --run
|
||||
```
|
||||
|
||||
## Example Usage
|
||||
|
||||
Once the distribution is running, you can use any of the available models. Here are some examples:
|
||||
|
||||
### Using OpenAI Models
|
||||
```bash
|
||||
llama-stack-client --endpoint http://localhost:8321 \
|
||||
inference chat-completion \
|
||||
--model-id openai/gpt-4o \
|
||||
--message "Hello, how are you?"
|
||||
```
|
||||
|
||||
### Using Fireworks Models
|
||||
```bash
|
||||
llama-stack-client --endpoint http://localhost:8321 \
|
||||
inference chat-completion \
|
||||
--model-id fireworks/meta-llama/Llama-3.2-3B-Instruct \
|
||||
--message "Write a short story about a robot."
|
||||
```
|
||||
|
||||
### Using Local Ollama Models
|
||||
```bash
|
||||
# First, make sure Ollama is running and you have a model
|
||||
ollama run llama3.2:3b
|
||||
|
||||
# Then use it through Llama Stack
|
||||
export OLLAMA_INFERENCE_MODEL=llama3.2:3b
|
||||
llama-stack-client --endpoint http://localhost:8321 \
|
||||
inference chat-completion \
|
||||
--model-id ollama/llama3.2:3b \
|
||||
--message "Explain quantum computing in simple terms."
|
||||
```
|
||||
|
||||
## Storage
|
||||
|
||||
The starter distribution uses SQLite for local storage of various components:
|
||||
|
||||
- **Metadata store**: `~/.llama/distributions/starter/registry.db`
|
||||
- **Inference store**: `~/.llama/distributions/starter/inference_store.db`
|
||||
- **FAISS store**: `~/.llama/distributions/starter/faiss_store.db`
|
||||
- **SQLite vector store**: `~/.llama/distributions/starter/sqlite_vec.db`
|
||||
- **Files metadata**: `~/.llama/distributions/starter/files_metadata.db`
|
||||
- **Agents store**: `~/.llama/distributions/starter/agents_store.db`
|
||||
- **Responses store**: `~/.llama/distributions/starter/responses_store.db`
|
||||
- **Trace store**: `~/.llama/distributions/starter/trace_store.db`
|
||||
- **Evaluation store**: `~/.llama/distributions/starter/meta_reference_eval.db`
|
||||
- **Dataset I/O stores**: Various HuggingFace and local filesystem stores
|
||||
|
||||
## Benefits of the Starter Distribution
|
||||
|
||||
1. **Comprehensive Coverage**: Includes most popular AI providers in one distribution
|
||||
2. **Flexible Configuration**: Easy to enable/disable providers based on your needs
|
||||
3. **No Local GPU Required**: Most providers are cloud-based, making it accessible to developers without high-end hardware
|
||||
4. **Easy Migration**: Start with hosted providers and gradually move to local ones as needed
|
||||
5. **Production Ready**: Includes safety, evaluation, and telemetry components
|
||||
6. **Tool Integration**: Comes with web search, RAG, and model context protocol tools
|
||||
|
||||
The starter distribution is ideal for developers who want to experiment with different AI providers, build prototypes quickly, or create applications that can work with multiple AI backends.
|
||||
Loading…
Add table
Add a link
Reference in a new issue