llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 02:17:31 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	cb40da210f	fix: update tests for OpenAI-style models endpoint (#4053 ) The llama-stack-client now uses /`v1/openai/v1/models` which returns OpenAI-compatible model objects with 'id' and 'custom_metadata' fields instead of the Resource-style 'identifier' field. Updated api_recorder to handle the new endpoint and modified tests to access model metadata appropriately. Deleted stale model recordings for re-recording. NOTE: CI will be red on this one since it is dependent on https://github.com/llamastack/llama-stack-client-python/pull/291/files landing. I verified locally that it is green.	2025-11-03 17:30:08 -08:00
Francisco Arceo	4566eebe05	feat: Add static file import system for docs (#3882 ) # What does this PR do? Add static file import system for docs - Use `remark-code-import` plugin to embed code at build time - Support importing Python code with syntax highlighting using `raw-loader` + `ReactMarkdown` One caveat is that currently when embedding markdown with code used the syntax highlighting isn't behaving but I'll investigate that in a follow up. ## Test Plan Python Example: <img width="1372" height="995" alt="Screenshot 2025-10-23 at 9 22 18 PM" src="https://github.com/user-attachments/assets/656d2c78-4d9b-45a4-bd5e-3f8490352b85" /> Markdown example: <img width="1496" height="1070" alt="Screenshot 2025-10-23 at 9 22 38 PM" src="https://github.com/user-attachments/assets/6c0a07ec-ff7c-45aa-b05f-8c46acd4445c" /> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-24 14:01:33 -04:00
Francisco Arceo	53c20f6113	feat: Adding Demo script (#3870 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 15s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 15s Details API Conformance Tests / check-schema-compatibility (push) Successful in 24s Details UI Tests / ui-tests (22) (push) Successful in 50s Details Pre-commit / pre-commit (push) Successful in 1m26s Details # What does this PR do? Updated quickstart `demo_script.py` to use OpenAI APIs, which is simply: ```python import io, requests from openai import OpenAI url="https://www.paulgraham.com/greatwork.html" client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") vs = client.vector_stores.create() response = requests.get(url) pseudo_file = io.BytesIO(str(response.content).encode('utf-8')) uploaded_file = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants") client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id) resp = client.responses.create( model="openai/gpt-4o", input="How do you do great work? Use the existing knowledge_search tool.", tools=[{"type": "file_search", "vector_store_ids": [vs.id]}], include=["file_search_call.results"], ) print(resp) ``` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-21 21:31:21 -04:00
Charlie Doern	573e783ff0	docs: fix sidebar of `Detailed Tutorial` (#3856 ) # What does this PR do? the sidebar currently has an extra `ii. Run the Script` because its incorrectly put into the doc as an H3 not an H4 (like the other ones) <img width="239" height="218" alt="Screenshot 2025-10-20 at 1 04 54 PM" src="https://github.com/user-attachments/assets/eb8cb26e-7ea9-4b61-9101-d64965b39647" /> Fix this which will update the sidebar Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-20 13:10:50 -04:00
Charlie Doern	b11bcfde11	refactor(build): rework CLI commands and build process (1/2) (#2974 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 22s Details Test llama stack list-deps / show-single-provider (push) Failing after 53s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 18s Details Python Package Build Test / build (3.13) (push) Failing after 24s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 27s Details Unit Tests / unit-tests (3.12) (push) Failing after 26s Details Vector IO Integration Tests / test-matrix (push) Failing after 44s Details API Conformance Tests / check-schema-compatibility (push) Successful in 52s Details Test llama stack list-deps / generate-matrix (push) Successful in 52s Details Test Llama Stack Build / build (push) Failing after 29s Details Test External API and Providers / test-external (venv) (push) Failing after 53s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m2s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m30s Details Test llama stack list-deps / list-deps-from-config (push) Failing after 1m59s Details Test llama stack list-deps / list-deps (push) Failing after 1m10s Details UI Tests / ui-tests (22) (push) Successful in 2m26s Details Pre-commit / pre-commit (push) Successful in 3m8s Details # What does this PR do? This PR does a few things outlined in #2878 namely: 1. adds `llama stack list-deps` a command which simply takes the build logic and instead of executing one of the `build_...` scripts, it displays all of the providers' dependencies using the `module` and `uv`. 2. deprecated `llama stack build` in favor of `llama stack list-deps` 3. updates all tests to use `list-deps` alongside `build`. PR 2/2 will migrate `llama stack run`'s default behavior to be `llama stack build --run` and use the new `list-deps` command under the hood before running the server. examples of `llama stack list-deps starter` ``` llama stack list-deps starter --format json { "name": "starter", "description": "Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments.", "apis": [ { "api": "inference", "provider": "remote::cerebras" }, { "api": "inference", "provider": "remote::ollama" }, { "api": "inference", "provider": "remote::vllm" }, { "api": "inference", "provider": "remote::tgi" }, { "api": "inference", "provider": "remote::fireworks" }, { "api": "inference", "provider": "remote::together" }, { "api": "inference", "provider": "remote::bedrock" }, { "api": "inference", "provider": "remote::nvidia" }, { "api": "inference", "provider": "remote::openai" }, { "api": "inference", "provider": "remote::anthropic" }, { "api": "inference", "provider": "remote::gemini" }, { "api": "inference", "provider": "remote::vertexai" }, { "api": "inference", "provider": "remote::groq" }, { "api": "inference", "provider": "remote::sambanova" }, { "api": "inference", "provider": "remote::azure" }, { "api": "inference", "provider": "inline::sentence-transformers" }, { "api": "vector_io", "provider": "inline::faiss" }, { "api": "vector_io", "provider": "inline::sqlite-vec" }, { "api": "vector_io", "provider": "inline::milvus" }, { "api": "vector_io", "provider": "remote::chromadb" }, { "api": "vector_io", "provider": "remote::pgvector" }, { "api": "files", "provider": "inline::localfs" }, { "api": "safety", "provider": "inline::llama-guard" }, { "api": "safety", "provider": "inline::code-scanner" }, { "api": "agents", "provider": "inline::meta-reference" }, { "api": "telemetry", "provider": "inline::meta-reference" }, { "api": "post_training", "provider": "inline::torchtune-cpu" }, { "api": "eval", "provider": "inline::meta-reference" }, { "api": "datasetio", "provider": "remote::huggingface" }, { "api": "datasetio", "provider": "inline::localfs" }, { "api": "scoring", "provider": "inline::basic" }, { "api": "scoring", "provider": "inline::llm-as-judge" }, { "api": "scoring", "provider": "inline::braintrust" }, { "api": "tool_runtime", "provider": "remote::brave-search" }, { "api": "tool_runtime", "provider": "remote::tavily-search" }, { "api": "tool_runtime", "provider": "inline::rag-runtime" }, { "api": "tool_runtime", "provider": "remote::model-context-protocol" }, { "api": "batches", "provider": "inline::reference" } ], "pip_dependencies": [ "pandas", "opentelemetry-exporter-otlp-proto-http", "matplotlib", "opentelemetry-sdk", "sentence-transformers", "datasets", "pymilvus[milvus-lite]>=2.4.10", "codeshield", "scipy", "torchvision", "tree_sitter", "h11>=0.16.0", "aiohttp", "pymongo", "tqdm", "pythainlp", "pillow", "torch", "emoji", "grpcio>=1.67.1,<1.71.0", "fireworks-ai", "langdetect", "psycopg2-binary", "asyncpg", "redis", "together", "torchao>=0.12.0", "openai", "sentencepiece", "aiosqlite", "google-cloud-aiplatform", "faiss-cpu", "numpy", "sqlite-vec", "nltk", "scikit-learn", "mcp>=1.8.1", "transformers", "boto3", "huggingface_hub", "ollama", "autoevals", "sqlalchemy[asyncio]", "torchtune>=0.5.0", "chromadb-client", "pypdf", "requests", "anthropic", "chardet", "aiosqlite", "fastapi", "fire", "httpx", "uvicorn", "opentelemetry-sdk", "opentelemetry-exporter-otlp-proto-http" ] } ``` <img width="1500" height="420" alt="Screenshot 2025-10-16 at 5 53 03 PM" src="https://github.com/user-attachments/assets/765929fb-93e2-44d7-9c3d-8918b70fc721" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-10-17 19:52:14 -07:00
IAN MILLER	007efa6eb5	refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-10-14 10:44:20 -04:00
ehhuang	a3f5072776	chore!: remove --env from `llama stack run` (#3711 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Installer CI / lint (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Installer CI / smoke-test-on-dev (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? user can simply set env vars in the beginning of the command.`FOO=BAR llama stack run ...` ## Test Plan Run TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711). * #3714 * __->__ #3711	2025-10-07 20:58:15 -07:00
Alexey Rybak	6101c8e015	docs: fix broken links (#3540 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> - Fixes broken links and Docusaurus search Closes #3518 ## Test Plan The following should produce a clean build with no warnings and search enabled: ``` npm install npm run gen-api-docs all npm run build npm run serve ``` <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-24 14:16:31 -07:00
Alexey Rybak	8537ada11b	docs: MDX leftover fixes (#3536 ) # What does this PR do? - Fixes Docusaurus build errors <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - `npm run build` compiles the build properly - Broken links expected and will be fixed in a follow-on PR <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-24 14:14:32 -07:00
Alexey Rybak	c71ce8df61	docs: concepts and building_applications migration (#3534 ) # What does this PR do? - Migrates the remaining documentation sections to the new documentation format <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Partial migration <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-09-24 14:05:30 -07:00

10 commits