llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-23 08:33:09 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	122de785c4	chore(cleanup)!: kill vector_db references as far as possible (#3864 ) There should not be "vector db" anywhere.	2025-10-20 20:06:16 -07:00
Ashwin Bharambe	2c43285e22	feat(stores)!: use backend storage references instead of configs (#3697 ) This PR changes configurations in a backward incompatible way. Run configs today repeat full SQLite/Postgres snippets everywhere a store is needed, which means duplicated credentials, extra connection pools, and lots of drift between files. This PR introduces named storage backends so the stack and providers can share a single catalog and reference those backends by name. ## Key Changes - Add `storage.backends` to `StackRunConfig`, register each KV/SQL backend once at startup, and validate that references point to the right family. - Move server stores under `storage.stores` with lightweight references (backend + namespace/table) instead of full configs. - Update every provider/config/doc to use the new reference style; docs/codegen now surface the simplified YAML. ## Migration Before: ```yaml metadata_store: type: sqlite db_path: ~/.llama/distributions/foo/registry.db inference_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} conversations_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} ``` After: ```yaml storage: backends: kv_default: type: kv_sqlite db_path: ~/.llama/distributions/foo/kvstore.db sql_default: type: sql_postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} stores: metadata: backend: kv_default namespace: registry inference: backend: sql_default table_name: inference_store max_write_queue_size: 10000 num_writers: 4 conversations: backend: sql_default table_name: openai_conversations ``` Provider configs follow the same pattern—for example, a Chroma vector adapter switches from: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} kvstore: type: sqlite db_path: ~/.llama/distributions/foo/chroma.db ``` to: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} persistence: backend: kv_default namespace: vector_io::chroma_remote ``` Once the backends are declared, everything else just points at them, so rotating credentials or swapping to Postgres happens in one place and the stack reuses a single connection pool.	2025-10-20 13:20:09 -07:00
IAN MILLER	007efa6eb5	refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com>	2025-10-14 10:44:20 -04:00
Matthew Farrellee	145b2bcf25	feat: make object registration idempotent (#3752 ) # What does this PR do? objects (vector dbs, models, scoring functions, etc) have an identifier and associated object values. we allow exact duplicate registrations. we reject registrations when the identifier exists and the associated object values differ. note: model are namespaced, i.e. {provider_id}/{identifier}, while other object types are not ## Test Plan ci w/ new tests	2025-10-09 17:04:28 -07:00
Sébastien Han	4b9ebbf6a2	chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750 ) Reverts llamastack/llama-stack#3624 Causing https://github.com/llamastack/llama-stack/issues/3749	2025-10-09 09:17:37 -04:00
Omar Abdelwahab	702fcd1abf	fix: Raising an error message to the user when registering an existing provider. (#3624 ) When the user wants to change the attributes (which could include model name, dimensions,...etc) of an already registered provider, they will get an error message asking that they first unregister the provider before registering a new one. # What does this PR do? This PR updated the register function to raise an error to the user when they attempt to register a provider that was already registered asking them to un-register the existing provider first. <!-- If resolving an issue, uncomment and update the line below --> #2313 ## Test Plan Tested the change with /tests/unit/registry/test_registry.py --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-10-08 12:09:23 +02:00
Matthew Farrellee	7a25be633c	fix: Revert "fix: Added a bug fix when registering new models" (#3473 ) the commit to be reverted is an public api behavior change to something we should not support. instead of allowing silent updates (the caller cannot see the log messages), we should be sending an error to the caller that they must first unregister the model before reusing the same name w/ a different backend.	2025-09-26 16:19:21 -04:00
Omar Abdelwahab	e0e2b1bd0e	fix: Added a bug fix when registering new models (#3453 ) # What does this PR do? Modified the code in registry.py. The key changes are: 1. Removed the `return False` statement 2. Added a warning log message that includes the object type, identifier, and provider_id for better debugging. 3. The method now continues with the registration process instead of early returning. --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-09-16 19:09:06 -07:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Matthew Farrellee	30b2e6a495	chore: default to pytest asyncio-mode=auto (#2730 ) # What does this PR do? previously, developers who ran `./scripts/unit-tests.sh` would get `asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and `@pytest_asyncio.fixture` were redundent. developers who ran `pytest` directly would get pytest's default (strict mode), would run into errors leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture` to their code. with this change - - `asyncio_mode=auto` is included in `pyproject.toml` making behavior consistent for all invocations of pytest - removes all redundant `@pytest_asyncio.fixture` and `@pytest.mark.asyncio` - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0` ## Test Plan - `./scripts/unit-tests.sh` - `uv run pytest tests/unit`	2025-07-11 13:00:24 -07:00
Derek Higgins	2e807b38cc	chore: Add fixtures to conftest.py (#2067 ) Add fixtures for SqliteKVStore, DiskDistributionRegistry and CachedDiskDistributionRegistry. And use them in tests that had all been duplicating similar setups. ## Test Plan unit tests continue to run Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-06 13:57:48 +02:00
Ashwin Bharambe	f95bc29ca9	fix: handle registry errors gracefully (#1732 ) We need to be able to handle stale registry entries gracefully. More needs to be done when we are deleting important attributes from resources which could have been persisted. But at the very least, the server cannot die. ## Test Plan Added unit tests	2025-03-20 15:24:07 -07:00
Ashwin Bharambe	4ca58eb987	refactor: tests/unittests -> tests/unit; tests/api -> tests/integration	2025-03-04 09:57:00 -08:00

13 commits