mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-23 18:13:56 +00:00
Merge branch 'main' into add-llamacpp-provider
This commit is contained in:
commit
dd00bff776
51 changed files with 1362 additions and 921 deletions
13
docs/_static/llama-stack-spec.html
vendored
13
docs/_static/llama-stack-spec.html
vendored
|
|
@ -14796,7 +14796,8 @@
|
|||
"description": "Template for formatting each retrieved chunk in the context. Available placeholders: {index} (1-based chunk ordinal), {chunk.content} (chunk content string), {metadata} (chunk metadata dict). Default: \"Result {index}\\nContent: {chunk.content}\\nMetadata: {metadata}\\n\""
|
||||
},
|
||||
"mode": {
|
||||
"type": "string",
|
||||
"$ref": "#/components/schemas/RAGSearchMode",
|
||||
"default": "vector",
|
||||
"description": "Search mode for retrieval—either \"vector\", \"keyword\", or \"hybrid\". Default \"vector\"."
|
||||
},
|
||||
"ranker": {
|
||||
|
|
@ -14831,6 +14832,16 @@
|
|||
}
|
||||
}
|
||||
},
|
||||
"RAGSearchMode": {
|
||||
"type": "string",
|
||||
"enum": [
|
||||
"vector",
|
||||
"keyword",
|
||||
"hybrid"
|
||||
],
|
||||
"title": "RAGSearchMode",
|
||||
"description": "Search modes for RAG query retrieval: - VECTOR: Uses vector similarity search for semantic matching - KEYWORD: Uses keyword-based search for exact matching - HYBRID: Combines both vector and keyword search for better results"
|
||||
},
|
||||
"RRFRanker": {
|
||||
"type": "object",
|
||||
"properties": {
|
||||
|
|
|
|||
14
docs/_static/llama-stack-spec.yaml
vendored
14
docs/_static/llama-stack-spec.yaml
vendored
|
|
@ -10346,7 +10346,8 @@ components:
|
|||
content string), {metadata} (chunk metadata dict). Default: "Result {index}\nContent:
|
||||
{chunk.content}\nMetadata: {metadata}\n"
|
||||
mode:
|
||||
type: string
|
||||
$ref: '#/components/schemas/RAGSearchMode'
|
||||
default: vector
|
||||
description: >-
|
||||
Search mode for retrieval—either "vector", "keyword", or "hybrid". Default
|
||||
"vector".
|
||||
|
|
@ -10373,6 +10374,17 @@ components:
|
|||
mapping:
|
||||
default: '#/components/schemas/DefaultRAGQueryGeneratorConfig'
|
||||
llm: '#/components/schemas/LLMRAGQueryGeneratorConfig'
|
||||
RAGSearchMode:
|
||||
type: string
|
||||
enum:
|
||||
- vector
|
||||
- keyword
|
||||
- hybrid
|
||||
title: RAGSearchMode
|
||||
description: >-
|
||||
Search modes for RAG query retrieval: - VECTOR: Uses vector similarity search
|
||||
for semantic matching - KEYWORD: Uses keyword-based search for exact matching
|
||||
- HYBRID: Combines both vector and keyword search for better results
|
||||
RRFRanker:
|
||||
type: object
|
||||
properties:
|
||||
|
|
|
|||
|
|
@ -145,6 +145,10 @@ $ llama stack build --template starter
|
|||
...
|
||||
You can now edit ~/.llama/distributions/llamastack-starter/starter-run.yaml and run `llama stack run ~/.llama/distributions/llamastack-starter/starter-run.yaml`
|
||||
```
|
||||
|
||||
```{tip}
|
||||
The generated `run.yaml` file is a starting point for your configuration. For comprehensive guidance on customizing it for your specific needs, infrastructure, and deployment scenarios, see [Customizing Your run.yaml Configuration](customizing_run_yaml.md).
|
||||
```
|
||||
:::
|
||||
:::{tab-item} Building from Scratch
|
||||
|
||||
|
|
|
|||
|
|
@ -2,6 +2,10 @@
|
|||
|
||||
The Llama Stack runtime configuration is specified as a YAML file. Here is a simplified version of an example configuration file for the Ollama distribution:
|
||||
|
||||
```{note}
|
||||
The default `run.yaml` files generated by templates are starting points for your configuration. For guidance on customizing these files for your specific needs, see [Customizing Your run.yaml Configuration](customizing_run_yaml.md).
|
||||
```
|
||||
|
||||
```{dropdown} 👋 Click here for a Sample Configuration File
|
||||
|
||||
```yaml
|
||||
|
|
|
|||
40
docs/source/distributions/customizing_run_yaml.md
Normal file
40
docs/source/distributions/customizing_run_yaml.md
Normal file
|
|
@ -0,0 +1,40 @@
|
|||
# Customizing run.yaml Files
|
||||
|
||||
The `run.yaml` files generated by Llama Stack templates are **starting points** designed to be customized for your specific needs. They are not meant to be used as-is in production environments.
|
||||
|
||||
## Key Points
|
||||
|
||||
- **Templates are starting points**: Generated `run.yaml` files contain defaults for development/testing
|
||||
- **Customization expected**: Update URLs, credentials, models, and settings for your environment
|
||||
- **Version control separately**: Keep customized configs in your own repository
|
||||
- **Environment-specific**: Create different configurations for dev, staging, production
|
||||
|
||||
## What You Can Customize
|
||||
|
||||
You can customize:
|
||||
- **Provider endpoints**: Change `http://localhost:8000` to your actual servers
|
||||
- **Swap providers**: Replace default providers (e.g., swap Tavily with Brave for search)
|
||||
- **Storage paths**: Move from `/tmp/` to production directories
|
||||
- **Authentication**: Add API keys, SSL, timeouts
|
||||
- **Models**: Different model sizes for dev vs prod
|
||||
- **Database settings**: Switch from SQLite to PostgreSQL
|
||||
- **Tool configurations**: Add custom tools and integrations
|
||||
|
||||
## Best Practices
|
||||
|
||||
- Use environment variables for secrets and environment-specific values
|
||||
- Create separate `run.yaml` files for different environments (dev, staging, prod)
|
||||
- Document your changes with comments
|
||||
- Test configurations before deployment
|
||||
- Keep your customized configs in version control
|
||||
|
||||
Example structure:
|
||||
```
|
||||
your-project/
|
||||
├── configs/
|
||||
│ ├── dev-run.yaml
|
||||
│ ├── prod-run.yaml
|
||||
└── README.md
|
||||
```
|
||||
|
||||
The goal is to take the generated template and adapt it to your specific infrastructure and operational needs.
|
||||
|
|
@ -9,6 +9,7 @@ This section provides an overview of the distributions available in Llama Stack.
|
|||
|
||||
importing_as_library
|
||||
configuration
|
||||
customizing_run_yaml
|
||||
list_of_distributions
|
||||
kubernetes_deployment
|
||||
building_distro
|
||||
|
|
|
|||
|
|
@ -54,7 +54,7 @@ Llama Stack is a server that exposes multiple APIs, you connect with it using th
|
|||
You can use Python to build and run the Llama Stack server, which is useful for testing and development.
|
||||
|
||||
Llama Stack uses a [YAML configuration file](../distributions/configuration.md) to specify the stack setup,
|
||||
which defines the providers and their settings.
|
||||
which defines the providers and their settings. The generated configuration serves as a starting point that you can [customize for your specific needs](../distributions/customizing_run_yaml.md).
|
||||
Now let's build and run the Llama Stack config for Ollama.
|
||||
We use `starter` as template. By default all providers are disabled, this requires enable ollama by passing environment variables.
|
||||
|
||||
|
|
@ -77,7 +77,7 @@ ENABLE_OLLAMA=ollama INFERENCE_MODEL="llama3.2:3b" llama stack build --template
|
|||
You can use a container image to run the Llama Stack server. We provide several container images for the server
|
||||
component that works with different inference providers out of the box. For this guide, we will use
|
||||
`llamastack/distribution-starter` as the container image. If you'd like to build your own image or customize the
|
||||
configurations, please check out [this guide](../references/index.md).
|
||||
configurations, please check out [this guide](../distributions/building_distro.md).
|
||||
First lets setup some environment variables and create a local directory to mount into the container’s file system.
|
||||
```bash
|
||||
export INFERENCE_MODEL="llama3.2:3b"
|
||||
|
|
|
|||
|
|
@ -114,7 +114,7 @@ For more details on TLS configuration, refer to the [TLS setup guide](https://mi
|
|||
| `uri` | `<class 'str'>` | No | PydanticUndefined | The URI of the Milvus server |
|
||||
| `token` | `str \| None` | No | PydanticUndefined | The token of the Milvus server |
|
||||
| `consistency_level` | `<class 'str'>` | No | Strong | The consistency level of the Milvus server |
|
||||
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig, annotation=NoneType, required=False, default='sqlite', discriminator='type'` | No | | Config for KV store backend (SQLite only for now) |
|
||||
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig` | No | sqlite | Config for KV store backend |
|
||||
| `config` | `dict` | No | {} | This configuration allows additional fields to be passed through to the underlying Milvus client. See the [Milvus](https://milvus.io/docs/install-overview.md) documentation for more details about Milvus in general. |
|
||||
|
||||
> **Note**: This configuration class accepts additional fields beyond those listed above. You can pass any additional configuration options that will be forwarded to the underlying provider.
|
||||
|
|
@ -124,6 +124,9 @@ For more details on TLS configuration, refer to the [TLS setup guide](https://mi
|
|||
```yaml
|
||||
uri: ${env.MILVUS_ENDPOINT}
|
||||
token: ${env.MILVUS_TOKEN}
|
||||
kvstore:
|
||||
type: sqlite
|
||||
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/milvus_remote_registry.db
|
||||
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -40,6 +40,7 @@ See [PGVector's documentation](https://github.com/pgvector/pgvector) for more de
|
|||
| `db` | `str \| None` | No | postgres | |
|
||||
| `user` | `str \| None` | No | postgres | |
|
||||
| `password` | `str \| None` | No | mysecretpassword | |
|
||||
| `kvstore` | `utils.kvstore.config.RedisKVStoreConfig \| utils.kvstore.config.SqliteKVStoreConfig \| utils.kvstore.config.PostgresKVStoreConfig \| utils.kvstore.config.MongoDBKVStoreConfig, annotation=NoneType, required=False, default='sqlite', discriminator='type'` | No | | Config for KV store backend (SQLite only for now) |
|
||||
|
||||
## Sample Configuration
|
||||
|
||||
|
|
@ -49,6 +50,9 @@ port: ${env.PGVECTOR_PORT:=5432}
|
|||
db: ${env.PGVECTOR_DB}
|
||||
user: ${env.PGVECTOR_USER}
|
||||
password: ${env.PGVECTOR_PASSWORD}
|
||||
kvstore:
|
||||
type: sqlite
|
||||
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/pgvector_registry.db
|
||||
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -36,7 +36,9 @@ See [Weaviate's documentation](https://weaviate.io/developers/weaviate) for more
|
|||
## Sample Configuration
|
||||
|
||||
```yaml
|
||||
{}
|
||||
kvstore:
|
||||
type: sqlite
|
||||
db_path: ${env.SQLITE_STORE_DIR:=~/.llama/dummy}/weaviate_registry.db
|
||||
|
||||
```
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue