mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-12 20:12:33 +00:00
26 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
feabcdd67b
|
docs: add documentation on how to use custom run yaml in docker (#3949)
as title
test plan:
```yaml
# custom-ollama-run.yaml
version: 2
image_name: starter
external_providers_dir: /.llama/providers.d
apis:
- inference
- vector_io
- files
- safety
- tool_runtime
- agents
providers:
inference:
# Single Ollama provider for all models
- provider_id: ollama
provider_type: remote::ollama
config:
url: ${env.OLLAMA_URL:=http://localhost:11434}
vector_io:
- provider_id: faiss
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: kv_default
files:
- provider_id: meta-reference-files
provider_type: inline::localfs
config:
storage_dir: /.llama/files
metadata_store:
table_name: files_metadata
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents
backend: kv_default
responses:
table_name: responses
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
storage:
backends:
kv_default:
type: kv_sqlite
db_path: /.llama/kvstore.db
sql_default:
type: sql_sqlite
db_path: /.llama/sql_store.db
stores:
metadata:
namespace: registry
backend: kv_default
inference:
table_name: inference_store
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
conversations:
table_name: openai_conversations
backend: sql_default
registered_resources:
models:
# All models use the same 'ollama' provider
- model_id: llama3.2-vision:latest
provider_id: ollama
provider_model_id: llama3.2-vision:latest
model_type: llm
- model_id: llama3.2:3b
provider_id: ollama
provider_model_id: llama3.2:3b
model_type: llm
# Embedding models
- model_id: nomic-embed-text-v2-moe
provider_id: ollama
provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
model_type: embedding
metadata:
embedding_dimension: 768
shields: []
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups: []
server:
port: 8321
telemetry:
enabled: true
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: ollama
model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
```
```bash
docker run
-it
--pull always
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
-v ~/.llama:/root/.llama
-v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml
-e RUN_CONFIG_PATH=/app/custom-run.yaml
-e OLLAMA_URL=http://host.docker.internal:11434/
llamastack/distribution-starter:0.3.0
--port $LLAMA_STACK_PORT
```
|
||
|
|
98a5047f9d
|
feat(prompts): attach prompts to storage stores in run configs (#3893)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for attaching prompts to storage stores in run configs. It allows to specify prompts as stores in different distributions. The need of this functionality was initiated in #3514 > Note, #3514 is divided on three separate PRs. Current PR is the first of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Manual testing and updated CI unit tests Prerequisites: 1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install` 2. `llama stack run starter ` ``` INFO 2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration: /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml INFO 2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates: Key: None Cert: None INFO 2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321 INFO 2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration: INFO 2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis: - agents - batches - datasetio - eval - files - inference - post_training - safety - scoring - tool_runtime - vector_io image_name: starter providers: agents: - config: persistence: agent_state: backend: kv_default namespace: agents responses: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: responses provider_id: meta-reference provider_type: inline::meta-reference batches: - config: kvstore: backend: kv_default namespace: batches provider_id: reference provider_type: inline::reference datasetio: - config: kvstore: backend: kv_default namespace: datasetio::huggingface provider_id: huggingface provider_type: remote::huggingface - config: kvstore: backend: kv_default namespace: datasetio::localfs provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: backend: kv_default namespace: eval provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: backend: sql_default table_name: files_metadata storage_dir: /Users/ianmiller/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '********' url: https://api.fireworks.ai/inference/v1 provider_id: fireworks provider_type: remote::fireworks - config: api_key: '********' url: https://api.together.xyz/v1 provider_id: together provider_type: remote::together - config: {} provider_id: bedrock provider_type: remote::bedrock - config: api_key: '********' base_url: https://api.openai.com/v1 provider_id: openai provider_type: remote::openai - config: api_key: '********' provider_id: anthropic provider_type: remote::anthropic - config: api_key: '********' provider_id: gemini provider_type: remote::gemini - config: api_key: '********' url: https://api.groq.com provider_id: groq provider_type: remote::groq - config: api_key: '********' url: https://api.sambanova.ai/v1 provider_id: sambanova provider_type: remote::sambanova - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: meta provider_id: torchtune-cpu provider_type: inline::torchtune-cpu safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard - config: {} provider_id: code-scanner provider_type: inline::code-scanner scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '********' provider_id: braintrust provider_type: inline::braintrust tool_runtime: - config: api_key: '********' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '********' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: persistence: backend: kv_default namespace: vector_io::faiss provider_id: faiss provider_type: inline::faiss - config: db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db persistence: backend: kv_default namespace: vector_io::sqlite_vec provider_id: sqlite-vec provider_type: inline::sqlite-vec registered_resources: benchmarks: [] datasets: [] models: [] scoring_fns: [] shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_stores: [] server: port: 8321 storage: backends: kv_default: db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db type: kv_sqlite sql_default: db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db type: sql_sqlite stores: conversations: backend: sql_default table_name: openai_conversations inference: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: inference_store metadata: backend: kv_default namespace: registry prompts: backend: kv_default namespace: prompts telemetry: enabled: true vector_stores: default_embedding_model: model_id: nomic-ai/nomic-embed-text-v1.5 provider_id: sentence-transformers default_provider_id: faiss version: 2 INFO 2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry: OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry INFO 2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models INFO 2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328] INFO 2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server (version: 0.3.0) INFO 2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task INFO 2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` As you can see, prompts are attached to stores in config Testing: 1. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.", "variables": ["name", "company", "role", "tone"] }' ``` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}% ` 2. Get prompt: `curl -X GET http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}% ` 3. Query sqlite KV storage to check created prompt: ``` sqlite> .mode column sqlite> .headers on sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%'; key value expiration ------------------------------------------------------------ ------------------------------------------------------------ ---------- prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab 163f:1 5f6e163f", "prompt": "Hello {{name}}! You are working at {{c ompany}}. Your role is {{role}} at {{company}}. Remember, {{ name}}, to be {{tone}}.", "version": 1, "variables": ["name" , "company", "role", "tone"], "is_default": false} prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e 1 163f:default sqlite> ``` |
||
|
|
509676641a
|
chore: update run configs (#3902)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m34s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
# What does this PR do? telemetry was deprecated ## Test Plan |
||
|
|
2a1a813308
|
chore: update docs for telemetry api removal (#3900)
# What does this PR do? Telemetry is no longer an API/provider. ## Test Plan |
||
|
|
658fb2c777 |
refactor(k8s): update run configs to v2 storage and registered_resources structure
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m30s
Migrates k8s run configs to match the updated run configs - Replace storage.references with storage.stores - Wrap resources under registered_resources section - Update provider configs to use persistence with namespace/backend - Add telemetry and vector_stores top-level sections - Simplify agent/files metadata store configuration |
||
|
|
4c718523fa
|
docs: fix the building distro file (#3880)
# What does this PR do? * Fixes the doc server build (which expects a blank line after imports) ## Test Plan * `cd docs && npm run build` |
||
|
|
bd3c473208
|
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877)
Reverts llamastack/llama-stack#3871 This PR broke RAG (even from Responses -- there _is_ a dependency) |
||
|
|
0e96279bee
|
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer targeted. We use the Responses implementation for knowledge_search which uses the `openai_vector_stores` pathway. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
2c43285e22
|
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
type: sqlite
db_path: ~/.llama/distributions/foo/registry.db
inference_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
conversations_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/distributions/foo/kvstore.db
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
persistence:
backend: kv_default
namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
|
||
|
|
359df3a37c
|
chore: update doc (#3857)
# What does this PR do? follows https://github.com/llamastack/llama-stack/pull/3839 ## Test Plan |
||
|
|
21772de5d3
|
chore: use dockerfile for building containers (#3839)
# What does this PR do? relates to #2878 We introduce a Containerfile which is used to replaced the `llama stack build` command (removal in a separate PR). ``` llama stack build --distro starter --image-type venv --run ``` is replaced by ``` llama stack list-deps starter | xargs -L1 uv pip install llama stack run starter ``` - See the updated workflow files for e2e workflow. ## Test Plan CI ``` ❯ docker build . -f docker/Dockerfile --build-arg DISTRO_NAME=starter --build-arg INSTALL_MODE=editable --tag test_starter ❯ docker run -p 8321:8321 test_starter ❯ curl http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gpt-4o-mini", "messages": [ { "role": "user", "content": "Hello!" } ] }' ``` --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3839). * #3855 * __->__ #3839 |
||
|
|
b11bcfde11
|
refactor(build): rework CLI commands and build process (1/2) (#2974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 22s
Test llama stack list-deps / show-single-provider (push) Failing after 53s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 44s
API Conformance Tests / check-schema-compatibility (push) Successful in 52s
Test llama stack list-deps / generate-matrix (push) Successful in 52s
Test Llama Stack Build / build (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 53s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m2s
Unit Tests / unit-tests (3.13) (push) Failing after 1m30s
Test llama stack list-deps / list-deps-from-config (push) Failing after 1m59s
Test llama stack list-deps / list-deps (push) Failing after 1m10s
UI Tests / ui-tests (22) (push) Successful in 2m26s
Pre-commit / pre-commit (push) Successful in 3m8s
# What does this PR do? This PR does a few things outlined in #2878 namely: 1. adds `llama stack list-deps` a command which simply takes the build logic and instead of executing one of the `build_...` scripts, it displays all of the providers' dependencies using the `module` and `uv`. 2. deprecated `llama stack build` in favor of `llama stack list-deps` 3. updates all tests to use `list-deps` alongside `build`. PR 2/2 will migrate `llama stack run`'s default behavior to be `llama stack build --run` and use the new `list-deps` command under the hood before running the server. examples of `llama stack list-deps starter` ``` llama stack list-deps starter --format json { "name": "starter", "description": "Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments.", "apis": [ { "api": "inference", "provider": "remote::cerebras" }, { "api": "inference", "provider": "remote::ollama" }, { "api": "inference", "provider": "remote::vllm" }, { "api": "inference", "provider": "remote::tgi" }, { "api": "inference", "provider": "remote::fireworks" }, { "api": "inference", "provider": "remote::together" }, { "api": "inference", "provider": "remote::bedrock" }, { "api": "inference", "provider": "remote::nvidia" }, { "api": "inference", "provider": "remote::openai" }, { "api": "inference", "provider": "remote::anthropic" }, { "api": "inference", "provider": "remote::gemini" }, { "api": "inference", "provider": "remote::vertexai" }, { "api": "inference", "provider": "remote::groq" }, { "api": "inference", "provider": "remote::sambanova" }, { "api": "inference", "provider": "remote::azure" }, { "api": "inference", "provider": "inline::sentence-transformers" }, { "api": "vector_io", "provider": "inline::faiss" }, { "api": "vector_io", "provider": "inline::sqlite-vec" }, { "api": "vector_io", "provider": "inline::milvus" }, { "api": "vector_io", "provider": "remote::chromadb" }, { "api": "vector_io", "provider": "remote::pgvector" }, { "api": "files", "provider": "inline::localfs" }, { "api": "safety", "provider": "inline::llama-guard" }, { "api": "safety", "provider": "inline::code-scanner" }, { "api": "agents", "provider": "inline::meta-reference" }, { "api": "telemetry", "provider": "inline::meta-reference" }, { "api": "post_training", "provider": "inline::torchtune-cpu" }, { "api": "eval", "provider": "inline::meta-reference" }, { "api": "datasetio", "provider": "remote::huggingface" }, { "api": "datasetio", "provider": "inline::localfs" }, { "api": "scoring", "provider": "inline::basic" }, { "api": "scoring", "provider": "inline::llm-as-judge" }, { "api": "scoring", "provider": "inline::braintrust" }, { "api": "tool_runtime", "provider": "remote::brave-search" }, { "api": "tool_runtime", "provider": "remote::tavily-search" }, { "api": "tool_runtime", "provider": "inline::rag-runtime" }, { "api": "tool_runtime", "provider": "remote::model-context-protocol" }, { "api": "batches", "provider": "inline::reference" } ], "pip_dependencies": [ "pandas", "opentelemetry-exporter-otlp-proto-http", "matplotlib", "opentelemetry-sdk", "sentence-transformers", "datasets", "pymilvus[milvus-lite]>=2.4.10", "codeshield", "scipy", "torchvision", "tree_sitter", "h11>=0.16.0", "aiohttp", "pymongo", "tqdm", "pythainlp", "pillow", "torch", "emoji", "grpcio>=1.67.1,<1.71.0", "fireworks-ai", "langdetect", "psycopg2-binary", "asyncpg", "redis", "together", "torchao>=0.12.0", "openai", "sentencepiece", "aiosqlite", "google-cloud-aiplatform", "faiss-cpu", "numpy", "sqlite-vec", "nltk", "scikit-learn", "mcp>=1.8.1", "transformers", "boto3", "huggingface_hub", "ollama", "autoevals", "sqlalchemy[asyncio]", "torchtune>=0.5.0", "chromadb-client", "pypdf", "requests", "anthropic", "chardet", "aiosqlite", "fastapi", "fire", "httpx", "uvicorn", "opentelemetry-sdk", "opentelemetry-exporter-otlp-proto-http" ] } ``` <img width="1500" height="420" alt="Screenshot 2025-10-16 at 5 53 03 PM" src="https://github.com/user-attachments/assets/765929fb-93e2-44d7-9c3d-8918b70fc721" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
f22aaef42f
|
chore!: remove telemetry API usage (#3815)
# What does this PR do? remove telemetry as a providable API from the codebase. This includes removing it from generated distributions but also the provider registry, the router, etc since `setup_logger` is tied pretty strictly to `Api.telemetry` being in impls we still need an "instantiated provider" in our implementations. However it should not be auto-routed or provided. So in validate_and_prepare_providers (called from resolve_impls) I made it so that if run_config.telemetry.enabled, we set up the meta-reference "provider" internally to be used so that log_event will work when called. This is the neatest way I think we can remove telemetry from the provider configs but also not need to rip apart the whole "telemetry is a provider" logic just yet, but we can do it internally later without disrupting users. so telemetry is removed from the registry such that if a user puts `telemetry:` as an API in their build/run config it will err out, but can still be used by us internally as we go through this transition. relates to #3806 Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
6ba9db3929
|
chore!: BREAKING CHANGE: remove sqlite from telemetry config (#3808)
# What does this PR do? - Removed sqlite sink from telemetry config. - Removed related code - Updated doc related to telemetry ## Test Plan CI |
||
|
|
007efa6eb5
|
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> |
||
|
|
7ee0ee7843
|
chore!: remove model mgmt from CLI for Hugging Face CLI (#3700)
This change removes the `llama model` and `llama download` subcommands from the CLI, replacing them with recommendations to use the Hugging Face CLI instead. Rationale for this change: - The model management functionality was largely duplicating what Hugging Face CLI already provides, leading to unnecessary maintenance overhead (except the download source from Meta?) - Maintaining our own implementation required fixing bugs and keeping up with changes in model repositories and download mechanisms - The Hugging Face CLI is more mature, widely adopted, and better maintained - This allows us to focus on the core Llama Stack functionality rather than reimplementing model management tools Changes made: - Removed all model-related CLI commands and their implementations - Updated documentation to recommend using `huggingface-cli` for model downloads - Removed Meta-specific download logic and statements - Simplified the CLI to focus solely on stack management operations Users should now use: - `huggingface-cli download` for downloading models - `huggingface-cli scan-cache` for listing downloaded models This is a breaking change as it removes previously available CLI commands. Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
a3f5072776
|
chore!: remove --env from llama stack run (#3711)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / smoke-test-on-dev (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do? user can simply set env vars in the beginning of the command.`FOO=BAR llama stack run ...` ## Test Plan Run TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711). * #3714 * __->__ #3711 |
||
|
|
426cac078b
|
chore: use uvicorn to start llama stack server everywhere (#3625)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do? https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn to start llama stack server which supports spawning multiple workers. This PR enables us to launch >1 workers from `llama stack run` (will add the parameter in a follow-up PR, keeping this PR on simplifying) by removing the old way of launching stack server and consolidates launching via uvicorn.run only. ## Test Plan ran `llama stack run starter` CI |
||
|
|
4dfbe46954
|
fix(docs): Correct indentation in documented example for access_policy (#3652)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 17s
Python Package Build Test / build (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m21s
`access_policy` needs to be inside the `auth` section in config; this PR corrects indentation in a documented example of configuring that section. |
||
|
|
426dc54883
|
docs: Fix Dell distro documentation code snippets (#3640)
# What does this PR do? * Updates code snippets for Dell distribution, fixing specific user home directory in code (replacing with $HOME) and updates docker instructions to use `docker` instead of `podman`. ## Test Plan N.A. Co-authored-by: Connor Hack <connorhack@fb.com> |
||
|
|
267f658968
|
docs: fix broken links (#3647)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
UI Tests / ui-tests (22) (push) Successful in 43s
Pre-commit / pre-commit (push) Successful in 2m0s
# What does this PR do? * Fixes numerous broken links in the new documentation ## Test Plan * Server builds |
||
|
|
b67aef2fc4
|
feat: add static embedding metadata to dynamic model listings for providers using OpenAIMixin (#3547)
# What does this PR do? - remove auto-download of ollama embedding models - add embedding model metadata to dynamic listing w/ unit test - add support and tests for allowed_models - removed inference provider models.py files where dynamic listing is enabled - store embedding metadata in embedding_model_metadata field on inference providers - make model_entries optional on ModelRegistryHelper and LiteLLMOpenAIMixin - make OpenAIMixin a ModelRegistryHelper - skip base64 embedding test for remote::ollama, always returns floats - only use OpenAI client for ollama model listing - remove unused build_model_entry function - remove unused get_huggingface_repo function ## Test Plan ci w/ new tests |
||
|
|
6101c8e015
|
docs: fix broken links (#3540)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> - Fixes broken links and Docusaurus search Closes #3518 ## Test Plan The following should produce a clean build with no warnings and search enabled: ``` npm install npm run gen-api-docs all npm run build npm run serve ``` <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |
||
|
|
8537ada11b
|
docs: MDX leftover fixes (#3536)
# What does this PR do? - Fixes Docusaurus build errors <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - `npm run build` compiles the build properly - Broken links expected and will be fixed in a follow-on PR <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |
||
|
|
c71ce8df61
|
docs: concepts and building_applications migration (#3534)
# What does this PR do? - Migrates the remaining documentation sections to the new documentation format <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Partial migration <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |
||
|
|
d23865757f
|
docs: provider and distro codegen migration (#3531)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> - Updates provider and distro codegen to handle the new format - Migrates provider and distro files to the new format ## Test Plan - Manual testing <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |