llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-03 01:48:05 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	68a2dfbad7	feat(ollama): periodically refresh models (#2805 ) For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list`	2025-07-18 12:20:36 -07:00
ehhuang	6d55f2f137	feat: enable ls client for files tests (#2769 ) # What does this PR do? titled ## Test Plan CI	2025-07-18 12:10:30 -07:00
Nehanth Narendrula	874b1cb00f	fix: DPOAlignmentConfig schema to use correct DPO parameters (#2804 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 13s Details Update ReadTheDocs / update-readthedocs (push) Failing after 13s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 19s Details Unit Tests / unit-tests (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s Details Integration Tests / test-matrix (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 22s Details Test Llama Stack Build / build (push) Failing after 15s Details Python Package Build Test / build (3.13) (push) Failing after 1m50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Pre-commit / pre-commit (push) Successful in 3m20s Details # What does this PR do? This PR fixes the `DPOAlignmentConfig` schema to use the correct Direct Preference Optimization (DPO) parameters. The current schema incorrectly uses PPO-inspired parameters (`reward_scale`, `reward_clip`, `epsilon`, `gamma`) that are not part of the DPO algorithm. This PR updates it to use the standard DPO parameters: - `beta`: The KL divergence coefficient that controls deviation from the reference model - `loss_type`: The type of DPO loss function (sigmoid, hinge, ipo, kto_pair) These parameters align with standard DPO implementations like HuggingFace's TRL library. --------- Co-authored-by: Ubuntu <ubuntu@ip-172-31-43-83.ec2.internal>	2025-07-18 11:56:00 -07:00
Charlie Doern	d994305f0a	fix: remove disabled providers from model dump (#2784 ) # What does this PR do? currently when running `llama stack run --template starter...` the __disabled__ providers, their models, etc are printed alongside the enabled ones making the output really confusing in server.py add a utility `remove_disabled_providers` which post-processes the model_dump output to remove any dict with `provider_id: __disabled__` we also have `debug` logs printing the disabled providers, so I think its safe to say that is the only indicator we need when using starter. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan before (output truncated because it was huge): ``` ... model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-11B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-11B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-11B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-3.2-90B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-3.2-90B-Vision-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-3.2-90B-Vision-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Scout-17B-16E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Scout-17B-16E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Scout-17B-16E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Llama-4-Maverick-17B-128E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-4-Maverick-17B-128E-Instruct model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Llama-4-Maverick-17B-128E-Instruct - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/sambanova/Meta-Llama-Guard-3-8B model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Meta-Llama-Guard-3-8B - metadata: {} model_id: ${env.ENABLE_SAMBANOVA:=__disabled__}/meta-llama/Llama-Guard-3-8B model_type: llm provider_id: __disabled__ provider_model_id: sambanova/Meta-Llama-Guard-3-8B - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: embedding provider_id: sentence-transformers provider_model_id: null providers: agents: - config: persistence_store: db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /Users/charliedoern/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '******' base_url: https://api.cerebras.ai provider_id: __disabled__ provider_type: remote::cerebras - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: api_token: '****' max_tokens: ${env.VLLM_MAX_TOKENS:=4096} tls_verify: ${env.VLLM_TLS_VERIFY:=true} url: ${env.VLLM_URL} provider_id: __disabled__ provider_type: remote::vllm - config: url: ${env.TGI_URL} provider_id: __disabled__ provider_type: remote::tgi - config: api_token: '****' huggingface_repo: ${env.INFERENCE_MODEL} provider_id: __disabled__ provider_type: remote::hf::serverless - config: api_token: '****' endpoint_name: ${env.INFERENCE_ENDPOINT_NAME} provider_id: __disabled__ provider_type: remote::hf::endpoint - config: api_key: '****' url: https://api.fireworks.ai/inference/v1 provider_id: __disabled__ provider_type: remote::fireworks - config: api_key: '****' url: https://api.together.xyz/v1 provider_id: __disabled__ provider_type: remote::together - config: {} provider_id: __disabled__ provider_type: remote::bedrock - config: api_token: '****' url: ${env.DATABRICKS_URL} provider_id: __disabled__ provider_type: remote::databricks - config: api_key: '****' append_api_version: ${env.NVIDIA_APPEND_API_VERSION:=True} url: ${env.NVIDIA_BASE_URL:=https://integrate.api.nvidia.com} provider_id: __disabled__ provider_type: remote::nvidia - config: api_token: '****' url: ${env.RUNPOD_URL:=} provider_id: __disabled__ provider_type: remote::runpod - config: api_key: '****' provider_id: __disabled__ provider_type: remote::openai - config: api_key: '****' provider_id: __disabled__ provider_type: remote::anthropic - config: api_key: '****' provider_id: __disabled__ provider_type: remote::gemini - config: api_key: '****' url: https://api.groq.com provider_id: __disabled__ provider_type: remote::groq - config: api_key: '****' openai_compat_api_base: https://api.fireworks.ai/inference/v1 provider_id: __disabled__ provider_type: remote::fireworks-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.llama.com/compat/v1/ provider_id: __disabled__ provider_type: remote::llama-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.together.xyz/v1 provider_id: __disabled__ provider_type: remote::together-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.groq.com/openai/v1 provider_id: __disabled__ provider_type: remote::groq-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.sambanova.ai/v1 provider_id: __disabled__ provider_type: remote::sambanova-openai-compat - config: api_key: '****' openai_compat_api_base: https://api.cerebras.ai/v1 provider_id: __disabled__ provider_type: remote::cerebras-openai-compat - config: api_key: '****' url: https://api.sambanova.ai/v1 provider_id: __disabled__ provider_type: remote::sambanova - config: api_key: '****' url: ${env.PASSTHROUGH_URL} provider_id: __disabled__ provider_type: remote::passthrough - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: huggingface device: cpu distributed_backend: null provider_id: huggingface provider_type: inline::huggingface safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: otel_exporter_otlp_endpoint: null service_name: "\u200B" sinks: console,sqlite sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '****' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss - config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec.db kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/sqlite_vec_registry.db type: sqlite provider_id: __disabled__ provider_type: inline::sqlite-vec - config: db_path: ${env.MILVUS_DB_PATH:=~/.llama/distributions/starter}/milvus.db kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/milvus_registry.db type: sqlite provider_id: __disabled__ provider_type: inline::milvus - config: url: ${env.CHROMADB_URL:=} provider_id: __disabled__ provider_type: remote::chromadb - config: db: ${env.PGVECTOR_DB:=} host: ${env.PGVECTOR_HOST:=localhost} kvstore: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/pgvector_registry.db type: sqlite password: '****' port: ${env.PGVECTOR_PORT:=5432} user: ${env.PGVECTOR_USER:=} provider_id: __disabled__ provider_type: remote::pgvector scoring_fns: [] server: auth: null host: null port: 8321 quota: null tls_cafile: null tls_certfile: null tls_keyfile: null shields: - params: null provider_id: null provider_shield_id: ollama/__disabled__ shield_id: __disabled__ tool_groups: - args: null mcp_endpoint: null provider_id: tavily-search toolgroup_id: builtin::websearch - args: null mcp_endpoint: null provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 ``` after: ``` INFO 2025-07-16 13:00:32,604 __main__:448 server: Run configuration: INFO 2025-07-16 13:00:32,606 __main__:450 server: apis: - agents - datasetio - eval - files - inference - post_training - safety - scoring - telemetry - tool_runtime - vector_io benchmarks: [] datasets: [] image_name: starter inference_store: db_path: /Users/charliedoern/.llama/distributions/starter/inference_store.db type: sqlite metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/registry.db type: sqlite models: - metadata: {} model_id: ollama/llama3.2:3b model_type: llm provider_id: ollama provider_model_id: llama3.2:3b - metadata: embedding_dimension: 384 model_id: all-MiniLM-L6-v2 model_type: embedding provider_id: sentence-transformers providers: agents: - config: persistence_store: db_path: /Users/charliedoern/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /Users/charliedoern/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference datasetio: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /Users/charliedoern/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /Users/charliedoern/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: url: http://localhost:11434 provider_id: ollama provider_type: remote::ollama - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: huggingface device: cpu provider_id: huggingface provider_type: inline::huggingface safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: "\u200B" sinks: console,sqlite sqlite_db_path: /Users/charliedoern/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /Users/charliedoern/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss scoring_fns: [] server: port: 8321 shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 ``` Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-18 10:44:35 -07:00
slekkala1	15916852e8	chore: Add slekkala1 to codeowners (#2817 ) Getting started on LLAMA Stack	2025-07-18 10:33:30 -07:00
Ashwin Bharambe	9e3ae50306	feat(server): construct the stack in a persistent event loop (#2818 ) When we call `construct_stack()`, providers are instantiated and `initialize()` is called. This call can end up doing _anything_ at all -- specifically, providers are free to create long running background tasks as part of this. If we wrapped this within a `asyncio.run()` as in the current code, these tasks get canceled when the stack construction finishes. This is not correct. The PR addresses the issue by creating a persistent event loop which is used for both the stack as well as for running the uvicorn server. In other words, the lifetime of the providers (and downstream async code) is now the same as the lifetime of the uvicorn server. ## Test Plan This should not affect any current code since we don't have background tasks created right now. However, https://github.com/meta-llama/llama-stack/pull/2805 will start using this functionality.	2025-07-18 10:29:19 -07:00
Nathan Weinberg	2bb9039173	docs: fix steps in the Quick Start Guide (#2800 ) # What does this PR do? 'build' command didn't take into account ENABLE flags for starter distro for some reason, I was having issues with HuggingFace access for the embedding model, so added a tip for that as well Closes #2779 ## Test Plan I ran the described steps manually, but it would be nice if someone else could try it and verify this still works We might consider having some CI job ensure the QSG remains functional - it's not a great experience for new users if they try Llama Stack for the first time and it doesn't work as we describe Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-18 09:08:46 -07:00
Christian Zaccaria	e45543f7f3	test: Measure and track code coverage (#2636 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> - Added coverage badge to README. - [See my fork](https://github.com/ChristianZaccaria/llama-stack) - Added a GitHub Actions workflow that runs the tests and updates the coverage badge. - [See run](`4574811323`) - Documented steps in `testing.md` for running the tests locally, and viewing the `html` report. - Excluded non-essential files from coverage reporting to provide a more accurate measurement. Automatically created PR to update coverage badge: https://github.com/ChristianZaccaria/llama-stack/pull/9 # Note for reviewers 1. Currently the coverage report shows a 45% coverage. Wondering if there are other files or directories that should also be excluded from the report to increase the percentage. The directories with the least test coverage are `llama_stack/cli`, `llama_stack/models`, and `llama_stack/ui`. - Should we exclude these? 2. [Required] The `GITHUB_TOKEN` should have write permissions to open a PR to update the coverage badge. # GitHub Issue <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2355 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> The `testing.md` file describes how to run the unit tests locally.	2025-07-18 18:08:36 +02:00
Nathan Weinberg	1785a6b39c	docs: add virtualenv instructions for running starter distro (#2780 ) # What does this PR do? we had directions for a container and conda but not venv Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-18 09:07:43 -07:00
Charlie Doern	0eb0583cdf	fix: amend integration test workflow (#2812 ) # What does this PR do? trigger integration tests on ALL changes to `tests/` to catch failures before they merge into main Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-18 15:23:36 +02:00
Mustafa Elbehery	fe6af7dc8b	chore(test): migrate unit tests from unittest to pytest nvidia test f… (#2794 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Python Package Build Test / build (3.13) (push) Failing after 11s Details Test Llama Stack Build / build-single-provider (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Test External Providers / test-external-providers (venv) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s Details Integration Tests / test-matrix (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Unit Tests / unit-tests (3.12) (push) Failing after 29s Details Python Package Build Test / build (3.12) (push) Failing after 1m46s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m44s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m51s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1m53s Details Pre-commit / pre-commit (push) Successful in 3m17s Details This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 12:32:19 +02:00
Mustafa Elbehery	b78b8e1486	chore: add `mypy` inference parallel utils (#2670 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 12:01:10 +02:00
Mustafa Elbehery	ca7edcd6a4	chore(api): add `mypy` coverage to `chat_format` (#2654 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:56:53 +02:00
Mustafa Elbehery	75480b01b8	chore(test): migrate unit tests from unittest to pytest for system prompt (#2789 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:54:02 +02:00
Mustafa Elbehery	3cdf748a8e	chore(test): migrate unit tests from unittest to pytest for nvidia datastore (#2790 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:52:47 +02:00
Mustafa Elbehery	55713abe7d	chore(test): migrate unit tests from unittest to pytest nvidia test p… (#2792 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-18 11:49:45 +02:00
Charlie Doern	d7cc38e934	fix: remove async test markers (fix pre-commit) (#2808 ) # What does this PR do? some async test markers are in the codebase causing pre-commit to fail due to #2744 remove these pytest fixtures ## Test Plan pre-commit passes Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-17 21:35:28 -07:00
Ashwin Bharambe	d64e096c5f	fix(cli): image name should not default to CONDA_DEFAULT_ENV (#2806 ) Some checks failed Integration Tests / discover-tests (push) Successful in 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 20s Details Python Package Build Test / build (3.13) (push) Failing after 19s Details Unit Tests / unit-tests (3.12) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 26s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s Details Unit Tests / unit-tests (3.13) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 53s Details Pre-commit / pre-commit (push) Failing after 2m14s Details If I am running `uv run llama stack run --image-type venv` it should not be saying to me "Conda detected" because I am pretty clearly telling it I need venv. The root cause is the offending line.	2025-07-17 16:40:35 -07:00
Matthew Farrellee	910b017680	chore: block asyncio marks in tests (#2744 ) # What does this PR do? use pre-commit to block addition of new asyncio marks, since we configure pytest with async-mode=auto, see https://github.com/meta-llama/llama-stack/pull/2730	2025-07-17 16:33:30 -07:00
Mustafa Elbehery	bd8a3ae3cc	chore(test): migrate unit tests from unittest to pytest for prompt adapter (#2788 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>	2025-07-17 16:31:38 -07:00
ehhuang	3ae4aeb344	test: add some tests for Telemetry API (#2787 ) # What does this PR do? ## Test Plan ENABLE_OLLAMA=ollama LLAMA_STACK_CONFIG=starter uv run pytest tests/integration/telemetry --text-model="ollama/llama3.2:3b-instruct-fp16"	2025-07-17 16:20:51 -07:00
Mustafa Elbehery	73868ce9e3	chore(test): migrate unit tests from unittest to pytest for server en… (#2795 ) This PR replaces unittest with pytest. Part of https://github.com/meta-llama/llama-stack/issues/2680 cc @leseb Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-17 16:20:12 -07:00
Matthew Farrellee	477bcd4d09	feat: allow dynamic model registration for nvidia inference provider (#2726 ) # What does this PR do? let's users register models available at https://integrate.api.nvidia.com/v1/models that isn't already in llama_stack/providers/remote/inference/nvidia/models.py ## Test Plan 1. run the nvidia distro 2. register a model from https://integrate.api.nvidia.com/v1/models that isn't already know, as of this writing nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example 3. perform inference w/ the model	2025-07-17 12:11:30 -07:00
Matthew Farrellee	57745101be	chore: internal change, make Model.provider_model_id non-optional (#2690 ) Some checks failed Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / generate-matrix (push) Successful in 14s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s Details Python Package Build Test / build (3.12) (push) Failing after 25s Details Test Llama Stack Build / build-single-provider (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 30s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 30s Details Unit Tests / unit-tests (3.12) (push) Failing after 32s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 40s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 32s Details Unit Tests / unit-tests (3.13) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 36s Details Test External Providers / test-external-providers (venv) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 36s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 42s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 40s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 47s Details Python Package Build Test / build (3.13) (push) Failing after 1m51s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m58s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2m5s Details Integration Tests / test-matrix (push) Failing after 36s Details Test Llama Stack Build / build (push) Failing after 37s Details Pre-commit / pre-commit (push) Successful in 3m40s Details - POST /v1/models accepts optional provider_model_id - ModelsRoutingTable.register_model handler ensures it is non-None, providing a default usage of Model.provider_model_id will no longer need to detect None	2025-07-17 08:26:57 -07:00
Derek Higgins	c2b64dce5b	fix: Move sentence-transformers to the top (#2703 ) Move sentence-transformers to be the first embedding in the list of models. This ensures it will always be the default and is more consistent then having the default change based on what env variables are available Closes: #2702 ## Test Plan Manually verified Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-07-17 10:31:30 -04:00
ehhuang	51b179e1c5	chore: update k8s template (#2786 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 14s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 50s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 58s Details Unit Tests / unit-tests (3.13) (push) Failing after 54s Details Integration Tests / test-matrix (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m40s Details # What does this PR do? - enables auth - updates to use distribution-starter docker ## Test Plan bash apply.sh	2025-07-16 15:07:26 -07:00
IAN MILLER	b57db11bed	feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s Details Unit Tests / unit-tests (3.13) (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models	2025-07-16 12:49:38 -04:00
Charlie Doern	6c516d391b	fix: de-clutter `llama stack run` logs (#2783 ) # What does this PR do? currently each disabled provider is printed as a warning, switch to debug. This level of verbosity isn't necessary, especially if we intend to grow the list of providers over time that can be in a single run yaml ## Test Plan before: <img width="1144" height="667" alt="Screenshot 2025-07-16 at 12 37 18 PM" src="https://github.com/user-attachments/assets/d14dbf76-6e40-4996-8a27-111e6a987d71" /> after: <img width="925" height="141" alt="Screenshot 2025-07-16 at 12 37 42 PM" src="https://github.com/user-attachments/assets/81efdbe1-923c-4c5f-9731-f89729043920" /> Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-07-16 09:44:26 -07:00
Nathan Weinberg	919ee3199b	docs: add missing bold title to match others (#2782 ) Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 18:05:48 +02:00
Sergey Yedrikov	30be1fd8b7	fix: SQLiteVecIndex.create(..., bank_id="test_bank.123") - bank_id with a dot - leads to sqlite3.OperationalError (#2770 ) (#2771 ) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2770. It replaces characters in SQLite table names that are not alphanumeric or underscores with underscores and quotes the table names with square brackets in SQL statements. Closes #[2770] ## Test Plan I added a ".123" suffix to the bank_id on the following line ``` index = await SQLiteVecIndex.create(dimension=embedding_dimension, db_path=db_path, bank_id="test_bank.123") ``` in tests/unit/providers/vector_io/test_sqlite_vec.py, which, without the fix in place, demonstrates the issue.	2025-07-16 08:25:44 -07:00
Nathan Weinberg	72e606355d	fix: add shutdown function for localfs provider (#2781 ) # What does this PR do? this was causing an unnessessary logger warning ## Test Plan Run `LLAMA_STACK_DIR=. ENABLE_OLLAMA=ollama OLLAMA_INFERENCE_MODEL=llama3.2:3b llama stack build --template starter --image-type venv --run` and then `Crtl-C` to shutdown Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 08:24:57 -07:00
Nathan Weinberg	3165197b75	chore: remove 'gha_workflow_llama_stack_tests.yml' (#2767 ) This was introduced in https://github.com/meta-llama/llama-stack/pull/523 but as far as I can tell has never been used. It's been over six months so it feels fair to remove it at this point. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-16 07:12:26 -07:00
Matthew Farrellee	a3e249807b	chore: remove vision model URL workarounds and simplify client creation (#2775 ) The vision models are now available at the standard URL, so the workaround code has been removed. This also simplifies the codebase by eliminating the need for per-model client caching. - Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct models - Convert _get_client method to _client property for cleaner API - Remove unnecessary lru_cache decorator and functools import - Simplify client creation logic to use single base URL for all models	2025-07-16 07:10:04 -07:00
IAN MILLER	fa1bb9ae00	docs: fix typo and link self loop for index.html#running-tests (#2777 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes typo "here here" and self loop link at [https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md](https://llama-stack.readthedocs.io/en/latest/contributing/index.html#tests/README.md) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2762 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-07-16 07:09:44 -07:00
Sébastien Han	ff9d4d8a9d	ci: do not pull model (#2776 ) the model is now available in the container image Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-16 04:58:05 -07:00
Sébastien Han	f85189022c	fix: re-hydrate requirement and fix package (#2774 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Integration Tests / discover-tests (push) Successful in 6s Details Test Llama Stack Build / generate-matrix (push) Successful in 10s Details Test Llama Stack Build / build-single-provider (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 23s Details Update ReadTheDocs / update-readthedocs (push) Failing after 21s Details Python Package Build Test / build (3.13) (push) Failing after 26s Details Test Llama Stack Build / build (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 30s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 35s Details Pre-commit / pre-commit (push) Successful in 1m20s Details Signed-off-by: Sébastien Han <seb@redhat.com>	2025-07-16 05:46:15 -04:00
Ashwin Bharambe	95fdc8ea94	build: Bump version to 0.2.15	2025-07-15 20:29:08 -07:00
Kelly Brown	b096794959	docs: Reorganize documentation on the webpage (#2651 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 2s Details Integration Tests / discover-tests (push) Successful in 2s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s Details Python Package Build Test / build (3.12) (push) Failing after 14s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Unit Tests / unit-tests (3.13) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Update ReadTheDocs / update-readthedocs (push) Failing after 15s Details Test Llama Stack Build / build-single-provider (push) Failing after 21s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Details Python Package Build Test / build (3.13) (push) Failing after 44s Details Test Llama Stack Build / build (push) Failing after 25s Details Integration Tests / test-matrix (push) Failing after 46s Details Pre-commit / pre-commit (push) Successful in 2m24s Details # What does this PR do? Reorganizes the Llama stack webpage into more concise index pages, introduce more of a workflow, and reduce repetition of content. New nav structure so far based on #2637 Further discussions in https://github.com/meta-llama/llama-stack/discussions/2585 Preview: ![Screenshot 2025-07-09 at 2 31 53 PM](https://github.com/user-attachments/assets/4c1f3845-b328-4f12-9f20-3f09375007af) You can also build a full local preview locally Feedback Looking for feedback on page titles and general feedback on the new structure Follow up documentation I plan on reducing some sections and standardizing some terminology in a follow up PR. More discussions on that in https://github.com/meta-llama/llama-stack/discussions/2585	2025-07-15 14:19:35 -07:00
Francisco Arceo	e1755d1ed2	chore: Adding OpenAI Vector Stores Files API compatibility for PGVector (#2755 ) # What does this PR do? Adding OpenAI Vector Stores Files API compatibility for PGVector <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Updated CI to include PGVector --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-15 15:46:49 -04:00
ehhuang	e64e4fc5a2	test: add tests against published client (#2752 ) # What does this PR do? closes #2751 ## Test Plan --------- Co-authored-by: Nathan Weinberg <31703736+nathan-weinberg@users.noreply.github.com>	2025-07-15 12:25:31 -07:00
Mark Campbell	65fcd03461	docs: update outdated llama stack client documentation (#2758 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Adds new documentation that was missing for the Llama Stack Python Client as well as updates old/outdated docs	2025-07-15 11:49:59 -07:00
Nathan Weinberg	b3d86ca926	fix: stop image_name from being cast to an integer (#2759 ) Some checks failed Integration Tests / discover-tests (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests / test-matrix (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 12s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 18s Details Update ReadTheDocs / update-readthedocs (push) Failing after 40s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s Details Pre-commit / pre-commit (push) Successful in 2m1s Details # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2490 introduced a new function for type conversion of strings. However, a side effect of this is that it will cast any string that can be cast to an integer if possible, which for something like `image_name` is not desired as we only accept strings for this field in the `StackRunConfig` This PR introduces logic to ensure that `image_name` remains a string Closes #2749 ## Test Plan You can run the original step to reproduce from the bug to verify this manually ```bash OPENAI_API_KEY=bogus llama stack build --image-type venv --image-name 2745 --providers inference=remote::openai --run ``` I have also added an additional unit test to prevent any future regression here Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-15 09:44:21 -07:00
Francisco Arceo	31b088978a	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 ) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-15 11:24:41 -04:00
ehhuang	5400a2e2b1	chore: remove tests.yaml (#2754 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s Details Python Package Build Test / build (3.13) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Test External Providers / test-external-providers (venv) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Integration Tests / discover-tests (push) Successful in 23s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 26s Details Python Package Build Test / build (3.12) (push) Failing after 22s Details Integration Tests / test-matrix (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 28s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 30s Details Unit Tests / unit-tests (3.13) (push) Failing after 57s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 1m2s Details Pre-commit / pre-commit (push) Successful in 1m51s Details # What does this PR do? Don't think this is used anymore ## Test Plan	2025-07-14 22:02:37 -07:00
Varsha	4ae5656c2f	feat: Implement keyword search in milvus (#2231 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 8s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details Test Llama Stack Build / generate-matrix (push) Successful in 8s Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 9s Details Test Llama Stack Build / build-single-provider (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s Details Integration Tests / test-matrix (push) Failing after 8s Details Test Llama Stack Build / build (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s Details Update ReadTheDocs / update-readthedocs (push) Failing after 50s Details Pre-commit / pre-commit (push) Successful in 2m9s Details # What does this PR do? This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus containers locally. In order to verify the implementation, run: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` You can also test the changes using the below script: ``` #!/usr/bin/env python3 import asyncio import os import uuid from typing import List from llama_stack_client import ( Agent, AgentEventLogger, LlamaStackClient, RAGDocument ) class MilvusRAGDemo: def __init__(self, base_url: str = "http://localhost:8321/"): self.client = LlamaStackClient(base_url=base_url) self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}" self.model_id = None self.embedding_model_id = None self.embedding_dimension = None def setup_models(self): """Get available models and select appropriate ones for LLM and embeddings.""" models = self.client.models.list() # Select embedding model embedding_models = [m for m in models if m.model_type == "embedding"] if not embedding_models: raise ValueError("No embedding models found") self.embedding_model_id = embedding_models[0].identifier self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"] def register_vector_db(self): print(f"Registering Milvus vector database: {self.vector_db_id}") response = self.client.vector_dbs.register( vector_db_id=self.vector_db_id, embedding_model=self.embedding_model_id, embedding_dimension=self.embedding_dimension, provider_id="milvus-remote", # Use remote Milvus ) print(f"Vector database registered successfully") return response def insert_documents(self): """Insert sample documents into the vector database.""" print("\nInserting sample documents...") # Sample documents about different topics documents = [ RAGDocument( document_id="ai_ml_basics", content=""" Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world. AI refers to the simulation of human intelligence in machines, while ML is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Deep learning, a subset of ML, uses neural networks with multiple layers to process complex patterns in data. Key concepts in AI/ML include: - Supervised Learning: Training with labeled data - Unsupervised Learning: Finding patterns in unlabeled data - Reinforcement Learning: Learning through trial and error - Neural Networks: Computing systems inspired by biological brains """, mime_type="text/plain", metadata={"topic": "technology", "category": "ai_ml"}, ), ] # Insert documents with chunking self.client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=self.vector_db_id, chunk_size_in_tokens=200, # Smaller chunks for better granularity ) print(f"Inserted {len(documents)} documents with chunking") def test_keyword_search(self): """Test keyword-based search using BM25.""" queries = [ "neural networks", "Python frameworks", "data cleaning", ] for query in queries: response = self.client.vector_io.query( vector_db_id=self.vector_db_id, query=query, params={ "mode": "keyword", # Keyword search "max_chunks": 3, "score_threshold": 0.0, } ) for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)): print(f" {i+1}. Score: {score:.4f}") print(f" Content: {chunk.content[:100]}...") print(f" Metadata: {chunk.metadata}") def run_demo(self): try: self.setup_models() self.register_vector_db() self.insert_documents() self.test_keyword_search() except Exception as e: print(f"Error during demo: {e}") raise def main(): """Main function to run the demo.""" # Check if Llama Stack server is running demo = MilvusRAGDemo() try: demo.run_demo() except Exception as e: print(f"Demo failed: {e}") if __name__ == "__main__": main() ``` [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-07-14 19:39:55 -04:00
Francisco Arceo	33f0d83ad3	chore: Move vector store `kvstore` implementation into `openai_vector_store_mixin.py` (#2748 )	2025-07-14 18:10:35 -04:00
Hardik Shah	6b8a8c1be9	fix: Safety in starter (#2731 ) - fireworks, together do not support Llama-guard 3 8b model anymore - Need to default to ollama - current safety shields logic was not correct since the shield_id was the provider ( which had duplicates ) - Followed similar logic to models Note: Seems a bit over-engineered but this can now be extended to other providers and fits in the overall mechanism of how env_vars are used to manage starter. ### How to test ``` ENABLE_OLLAMA=ollama ENABLE_FIREWORKS=fireworks SAFETY_MODEL=llama-guard3:1b pytest -s -v tests/integration/ --stack-config starter -k 'not(supervised_fine_tune or builtin_tool_code or safety_with_image or code_interpreter_for or rag_and_code or truncation or register_and_unregister)' --text-model fireworks/meta-llama/Llama-3.3-70B-Instruct --vision-model fireworks/meta-llama/Llama-4-Scout-17B-16E-Instruct --safety-shield llama-guard3:1b --embedding-model all-MiniLM-L6-v2 ``` ### Related but not obvious in this PR In the llama-stack-ops repo, we run tests before publishing packages and docker containers. The actions in that repo were using the fireworks / together distros ( which are non-existent ) So need to update that to run with `starter` and use `ollama` specifically for safety.	2025-07-14 15:07:40 -07:00
Nathan Weinberg	6ad22c209f	chore: add issue template for technical debt (#2753 ) # What does this PR do? Adds a template for technical debt. Currently we don't support blank issues so everything filed has to a bug or a feature. This would allow maintainers as well as community members to track things we might want to merge to expose the functionality but should be addressed later. Such things can also be "good first issues" for new contributors. ## Example of what we constitute as technical debt Inelegant code solutions, tests we intend to temporarily disable but would like to restore, CI hacks around infrastructure or installation, etc. Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-14 14:41:44 -07:00
ehhuang	aa0840c281	docs: fix building distro link (#2750 ) # What does this PR do? ## Test Plan Co-authored-by: raghotham <rsm@meta.com>	2025-07-14 12:06:56 -07:00
Matthew Farrellee	f731f369a2	feat: add infrastructure to allow inference model discovery (#2710 ) # What does this PR do? inference providers each have a static list of supported / known models. some also have access to a dynamic list of currently available models. this change gives prodivers using the ModelRegistryHelper the ability to combine their static and dynamic lists. for instance, OpenAIInferenceAdapter can implement ``` def query_available_models(self) -> list[str]: return [entry.model for entry in self.openai_client.models.list()] ``` to augment its static list w/ a current list from openai. ## Test Plan scripts/unit-test.sh	2025-07-14 11:38:53 -07:00

1 2 3 4 5 ...

2261 commits