llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-17 10:02:36 +00:00

Author	SHA1	Message	Date
Francisco Javier Arceo	95b2948d11	feat: Add support for query rewrite in vector_store.search (#4171 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details Python Package Build Test / build (3.12) (push) Successful in 15s Details Python Package Build Test / build (3.13) (push) Successful in 20s Details Test External API and Providers / test-external (venv) (push) Failing after 41s Details Vector IO Integration Tests / test-matrix (push) Failing after 49s Details UI Tests / ui-tests (22) (push) Successful in 51s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m27s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m45s Details Pre-commit / pre-commit (22) (push) Failing after 2m30s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m22s Details # What does this PR do? Actualize query rewrite in search API, add `default_query_expansion_model` and `query_expansion_prompt` in `VectorStoresConfig`. Makes `rewrite_query` parameter functional in vector store search. - `rewrite_query=false` (default): Use original query - `rewrite_query=true`: Expand query via LLM, or fail gracefully if no LLM available Adds 4 parameters to`VectorStoresConfig`: - `default_query_expansion_model`: LLM model for query expansion (optional) - `query_expansion_prompt`: Custom prompt template (optional, uses built-in default) - `query_expansion_max_tokens`: Configurable token limit (default: 100) - `query_expansion_temperature`: Configurable temperature (default: 0.3) Enabled `run.yaml`: ```yaml vector_stores: rewrite_query_params: model: provider_id: "ollama" model_id: "llama3.2:3b-instruct-fp16" # prompt defaults to built-in # max_tokens defaults to 100 # temperature defaults to 0.3 ``` Fully customized `run.yaml`: ```yaml vector_stores: default_provider_id: faiss default_embedding_model: provider_id: sentence-transformers model_id: nomic-ai/nomic-embed-text-v1.5 rewrite_query_params: model: provider_id: ollama model_id: llama3.2:3b-instruct-fp16 prompt: "Rewrite this search query to improve retrieval results by expanding it with relevant synonyms and related terms: {query}" max_tokens: 100 temperature: 0.3 ``` ## Test Plan Added test and recording Example script as well: ```python import asyncio from llama_stack_client import LlamaStackClient from io import BytesIO def gen_file(client, text: str=""): file_buffer = BytesIO(text.encode('utf-8')) file_buffer.name = "my_file.txt" uploaded_file = client.files.create( file=file_buffer, purpose="assistants" ) return uploaded_file async def test_query_rewriting(): client = LlamaStackClient(base_url="http://0.0.0.0:8321/") uploaded_file = gen_file(client, "banana banana apple") uploaded_file2 = gen_file(client, "orange orange kiwi") vs = client.vector_stores.create() xf_vs = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id) xf_vs1 = client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file2.id) response1 = client.vector_stores.search( vector_store_id=vs.id, query="apple", max_num_results=3, rewrite_query=False ) response2 = client.vector_stores.search( vector_store_id=vs.id, query="kiwi", max_num_results=3, rewrite_query=True, ) print(f"\n🔵 Response 1 (rewrite_query=False):\n\033[94m{response1}\033[0m") print(f"\n🟢 Response 2 (rewrite_query=True):\n\033[92m{response2}\033[0m") for f in [uploaded_file.id, uploaded_file2.id]: client.files.delete(file_id=f) client.vector_stores.delete(vector_store_id=vs.id) if __name__ == "__main__": asyncio.run(test_query_rewriting()) ``` And see the screen shot of the server logs showing it worked. <img width="1111" height="826" alt="Screenshot 2025-11-19 at 1 16 03 PM" src="https://github.com/user-attachments/assets/2d188b44-1fef-4df5-b465-2d6728ca49ce" /> Notice the log: ```bash Query rewritten: 'kiwi' → 'kiwi, a small brown or green fruit native to New Zealand, or a person having a fuzzy brown outer skin similar in appearance.' ``` So `kiwi` was expanded. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-12-10 10:06:19 -05:00
Charlie Doern	a078f089d9	fix: rename llama_stack_api dir (#4155 ) Some checks failed Integration Tests (Replay) / generate-matrix (push) Successful in 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Test llama stack list-deps / generate-matrix (push) Successful in 29s Details Test Llama Stack Build / build-single-provider (push) Successful in 33s Details Test llama stack list-deps / list-deps-from-config (push) Successful in 32s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Test Llama Stack Build / build (push) Successful in 39s Details Test llama stack list-deps / show-single-provider (push) Successful in 46s Details Python Package Build Test / build (3.13) (push) Failing after 44s Details Test External API and Providers / test-external (venv) (push) Failing after 44s Details Vector IO Integration Tests / test-matrix (push) Failing after 56s Details Test llama stack list-deps / list-deps (push) Failing after 47s Details Unit Tests / unit-tests (3.12) (push) Failing after 1m42s Details Unit Tests / unit-tests (3.13) (push) Failing after 1m55s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s Details Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s Details Pre-commit / pre-commit (push) Successful in 5m17s Details # What does this PR do? the directory structure was src/llama-stack-api/llama_stack_api instead it should just be src/llama_stack_api to match the other packages. update the structure and pyproject/linting config --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>	2025-11-13 15:04:36 -08:00
Francisco Arceo	a82b79ce57	fix: Error out when creating vector store with unknown embedding model (#4154 ) # What does this PR do? Error out when creating vector store with unknown embedding model Closes https://github.com/llamastack/llama-stack/issues/4047 ## Test Plan Added tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-13 13:43:31 -08:00
Charlie Doern	840ad75fe9	feat: split API and provider specs into separate llama-stack-api pkg (#3895 ) # What does this PR do? Extract API definitions and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. see: https://github.com/llamastack/llama-stack/pull/2978 and https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942 Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Pre-cursor PRs to this one: - #4093 - #3954 - #4064 These PRs moved key pieces _out_ of the Api pkg, limiting the scope of change here. relates to #3237 ## Test Plan Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-11-13 11:51:17 -08:00
Francisco Arceo	eb3f9ac278	feat: allow returning embeddings and metadata from `/vector_stores/` methods; disallow changing Provider ID (#4046 ) # What does this PR do? - Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to allow returning `embeddings` and `metadata` using the `extra_query` - Updates the UI accordingly to display them. - Update UI to support CRUD operations in the Vector Stores section and adds a new modal exposing the functionality. - Updates Vector Store update to fail if a user tries to update Provider ID (which doesn't make sense to allow) ```python In [1]: client.vector_stores.files.content( vector_store_id=vector_store.id, file_id=file.id, extra_query={"include_embeddings": True, "include_metadata": True} ) Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic -ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt') ``` Screenshots of UI are displayed below: ### List Vector Store with Added "Create New Vector Store" <img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47 25 PM" src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3" /> ### Create New Vector Store <img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47 49 PM" src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158" /> ### Edit Vector Store <img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48 32 PM" src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414" /> ### Vector Store Files Contents page (with Embeddings) <img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54 32 PM" src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27" /> ### Vector Store Files Contents Details page (with Embeddings) <img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55 00 PM" src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Tests added for Middleware extension and Provider failures. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-11-12 09:59:48 -08:00
ehhuang	9916cb3b17	chore: support default model in moderations API (#3890 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 5s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m33s Details # What does this PR do? https://platform.openai.com/docs/api-reference/moderations supports optional model parameter. This PR adds support for using moderations API with model=None if a default shield id is provided via safety config. ## Test Plan added tests manual test: ``` > SAFETY_MODEL='together/meta-llama/Llama-Guard-4-12B' uv run llama stack run starter > curl http://localhost:8321/v1/moderations \ -H "Content-Type: application/json" \ -d '{ "input": [ "hello" ] }' ```	2025-10-23 16:03:53 -07:00
Ashwin Bharambe	122de785c4	chore(cleanup)!: kill vector_db references as far as possible (#3864 ) There should not be "vector db" anywhere.	2025-10-20 20:06:16 -07:00
Francisco Arceo	968c364a3e	chore: Auto-detect Provider ID when only 1 Vector Store Provider avai… (#3802 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 8s Details API Conformance Tests / check-schema-compatibility (push) Successful in 18s Details UI Tests / ui-tests (22) (push) Successful in 29s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? 2 main changes: 1. Remove `provider_id` requirement in call to vector stores and 2. Removes "register first embedding model" logic - Now forces embedding model id as required on Vector Store creation Simplifies the UX for OpenAI to: ```python vs = client.vector_stores.create( name="my_citations_db", extra_body={ "embedding_model": "ollama/nomic-embed-text:latest", } ) ``` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-13 10:25:36 -07:00

8 commits