mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-25 01:01:13 +00:00 
			
		
		
		
	
	
		
			265 commits
		
	
	
	| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|  | 3130ca0a78 | feat: implement keyword, vector and hybrid search inside vector stores for PGVector provider (#3064) # What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this task is to implement
`openai/v1/vector_stores/{vector_store_id}/search` for PGVector
provider. It involves implementing vector similarity search, keyword
search and hybrid search for `PGVectorIndex`.
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3006 
## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run unit tests:
` ./scripts/unit-tests.sh `
Run integration tests for openai vector stores:
1. Export env vars:
```
export ENABLE_PGVECTOR=true
export PGVECTOR_HOST=localhost
export PGVECTOR_PORT=5432
export PGVECTOR_DB=llamastack
export PGVECTOR_USER=llamastack
export PGVECTOR_PASSWORD=llamastack
```
2. Create DB:
```
psql -h localhost -U postgres -c "CREATE ROLE llamastack LOGIN PASSWORD 'llamastack';"
psql -h localhost -U postgres -c "CREATE DATABASE llamastack OWNER llamastack;"
psql -h localhost -U llamastack -d llamastack -c "CREATE EXTENSION IF NOT EXISTS vector;"
```
3. Install sentence-transformers:
` uv pip install sentence-transformers  `
4. Run:
```
uv run --group test pytest -s -v --stack-config="inference=inline::sentence-transformers,vector_io=remote::pgvector" --embedding-model sentence-transformers/all-MiniLM-L6-v2 tests/integration/vector_io/test_openai_vector_stores.py
```
Inspect PGVector vector stores (optional):
```
psql llamastack                                                                                                         
psql (14.18 (Homebrew))
Type "help" for help.
llamastack=# \z
                                                    Access privileges
 Schema |                         Name                         | Type  | Access privileges | Column privileges | Policies 
--------+------------------------------------------------------+-------+-------------------+-------------------+----------
 public | llamastack_kvstore                                   | table |                   |                   | 
 public | metadata_store                                       | table |                   |                   | 
 public | vector_store_pgvector_main                           | table |                   |                   | 
 public | vector_store_vs_1dfbc061_1f4d_4497_9165_ecba2622ba3a | table |                   |                   | 
 public | vector_store_vs_2085a9fb_1822_4e42_a277_c6a685843fa7 | table |                   |                   | 
 public | vector_store_vs_2b3dae46_38be_462a_afd6_37ee5fe661b1 | table |                   |                   | 
 public | vector_store_vs_2f438de6_f606_4561_9d50_ef9160eb9060 | table |                   |                   | 
 public | vector_store_vs_3eeca564_2580_4c68_bfea_83dc57e31214 | table |                   |                   | 
 public | vector_store_vs_53942163_05f3_40e0_83c0_0997c64613da | table |                   |                   | 
 public | vector_store_vs_545bac75_8950_4ff1_b084_e221192d4709 | table |                   |                   | 
 public | vector_store_vs_688a37d8_35b2_4298_a035_bfedf5b21f86 | table |                   |                   | 
 public | vector_store_vs_70624d9a_f6ac_4c42_b8ab_0649473c6600 | table |                   |                   | 
 public | vector_store_vs_73fc1dd2_e942_4972_afb1_1e177b591ac2 | table |                   |                   | 
 public | vector_store_vs_9d464949_d51f_49db_9f87_e033b8b84ac9 | table |                   |                   | 
 public | vector_store_vs_a1e4d724_5162_4d6d_a6c0_bdafaf6b76ec | table |                   |                   | 
 public | vector_store_vs_a328fb1b_1a21_480f_9624_ffaa60fb6672 | table |                   |                   | 
 public | vector_store_vs_a8981bf0_2e66_4445_a267_a8fff442db53 | table |                   |                   | 
 public | vector_store_vs_ccd4b6a4_1efd_4984_ad03_e7ff8eadb296 | table |                   |                   | 
 public | vector_store_vs_cd6420a4_a1fc_4cec_948c_1413a26281c9 | table |                   |                   | 
 public | vector_store_vs_cd709284_e5cf_4a88_aba5_dc76a35364bd | table |                   |                   | 
 public | vector_store_vs_d7a4548e_fbc1_44d7_b2ec_b664417f2a46 | table |                   |                   | 
 public | vector_store_vs_e7f73231_414c_4523_886c_d1174eee836e | table |                   |                   | 
 public | vector_store_vs_ffd53588_819f_47e8_bb9d_954af6f7833d | table |                   |                   | 
(23 rows)
llamastack=# 
```
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> | ||
|  | ed418653ec | chore(dev): add inequality support to sqlstore where clause (#3272) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Vector IO Integration Tests / test-matrix (push) Failing after 1s Pre-commit / pre-commit (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 0s Python Package Build Test / build (3.13) (push) Failing after 1s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Test External API and Providers / test-external (venv) (push) Failing after 1s UI Tests / ui-tests (22) (push) Failing after 0s Unit Tests / unit-tests (3.12) (push) Failing after 1s Unit Tests / unit-tests (3.13) (push) Failing after 1s # What does this PR do? add the ability to use inequalities in the where clause of the sqlstore. this is infrastructure for files expiration. ## Test Plan unit tests | ||
|  | 3b9278f254 | feat: implement query_metrics (#3074) # What does this PR do? query_metrics currently has no implementation, meaning once a metric is emitted there is no way in llama stack to query it from the store. implement query_metrics for the meta_reference provider which follows a similar style to `query_traces`, using the trace_store to format an SQL query and execute it in this case the parameters for the query are `metric.METRIC_NAME, start_time, and end_time` and any other matchers if they are provided. this required client side changes since the client had no `query_metrics` or any associated resources, so any tests here will fail but I will provide manual execution logs for the new tests I am adding order the metrics by timestamp. Additionally add `unit` to the `MetricDataPoint` class since this adds much more context to the metric being queried. depends on https://github.com/llamastack/llama-stack-client-python/pull/260 ## Test Plan ``` import time import uuid def create_http_client(): from llama_stack_client import LlamaStackClient return LlamaStackClient(base_url="http://localhost:8321") client = create_http_client() response = client.telemetry.query_metrics(metric_name="total_tokens", start_time=0) print(response) ``` ``` ╰─ python3.12 ~/telemetry.py INFO:httpx:HTTP Request: POST http://localhost:8322/v1/telemetry/metrics/total_tokens "HTTP/1.1 200 OK" [TelemetryQueryMetricsResponse(data=None, metric='total_tokens', labels=[], values=[{'timestamp': 1753999514, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999816, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999881, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999956, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000200, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000419, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000714, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000876, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000908, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754001309, 'value': 584.0, 'unit': 'tokens'}, {'timestamp': 1754001311, 'value': 138.0, 'unit': 'tokens'}, {'timestamp': 1754001316, 'value': 349.0, 'unit': 'tokens'}, {'timestamp': 1754001318, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001320, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001341, 'value': 923.0, 'unit': 'tokens'}, {'timestamp': 1754001350, 'value': 354.0, 'unit': 'tokens'}, {'timestamp': 1754001462, 'value': 417.0, 'unit': 'tokens'}, {'timestamp': 1754001464, 'value': 158.0, 'unit': 'tokens'}, {'timestamp': 1754001475, 'value': 697.0, 'unit': 'tokens'}, {'timestamp': 1754001477, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001479, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001489, 'value': 298.0, 'unit': 'tokens'}, {'timestamp': 1754001541, 'value': 615.0, 'unit': 'tokens'}, {'timestamp': 1754001543, 'value': 119.0, 'unit': 'tokens'}, {'timestamp': 1754001548, 'value': 310.0, 'unit': 'tokens'}, {'timestamp': 1754001549, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001551, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001568, 'value': 714.0, 'unit': 'tokens'}, {'timestamp': 1754001800, 'value': 437.0, 'unit': 'tokens'}, {'timestamp': 1754001802, 'value': 200.0, 'unit': 'tokens'}, {'timestamp': 1754001806, 'value': 262.0, 'unit': 'tokens'}, {'timestamp': 1754001808, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001810, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001816, 'value': 82.0, 'unit': 'tokens'}, {'timestamp': 1754001923, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754001929, 'value': 391.0, 'unit': 'tokens'}, {'timestamp': 1754001939, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754001941, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001942, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001952, 'value': 252.0, 'unit': 'tokens'}, {'timestamp': 1754002053, 'value': 251.0, 'unit': 'tokens'}, {'timestamp': 1754002059, 'value': 375.0, 'unit': 'tokens'}, {'timestamp': 1754002062, 'value': 244.0, 'unit': 'tokens'}, {'timestamp': 1754002064, 'value': 111.0, 'unit': 'tokens'}, {'timestamp': 1754002065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002083, 'value': 719.0, 'unit': 'tokens'}, {'timestamp': 1754002302, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754002306, 'value': 218.0, 'unit': 'tokens'}, {'timestamp': 1754002308, 'value': 198.0, 'unit': 'tokens'}, {'timestamp': 1754002309, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754002311, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002324, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754003169, 'value': 499.0, 'unit': 'tokens'}, {'timestamp': 1754003171, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003173, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003185, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003448, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003453, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003589, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003609, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754003614, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 303.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 51.0, 'unit': 'tokens'}, {'timestamp': 1754003713, 'value': 426.0, 'unit': 'tokens'}, {'timestamp': 1754003714, 'value': 70.0, 'unit': 'tokens'}, {'timestamp': 1754003715, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003724, 'value': 225.0, 'unit': 'tokens'}, {'timestamp': 1754004226, 'value': 516.0, 'unit': 'tokens'}, {'timestamp': 1754004228, 'value': 127.0, 'unit': 'tokens'}, {'timestamp': 1754004232, 'value': 281.0, 'unit': 'tokens'}, {'timestamp': 1754004234, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004236, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004244, 'value': 206.0, 'unit': 'tokens'}, {'timestamp': 1754004683, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004690, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 124.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 65.0, 'unit': 'tokens'}, {'timestamp': 1754004694, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004703, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754004743, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004749, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754005566, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754006101, 'value': 159.0, 'unit': 'tokens'}, {'timestamp': 1754006105, 'value': 272.0, 'unit': 'tokens'}, {'timestamp': 1754006109, 'value': 308.0, 'unit': 'tokens'}, {'timestamp': 1754006110, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754006112, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754006130, 'value': 705.0, 'unit': 'tokens'}, {'timestamp': 1754051825, 'value': 454.0, 'unit': 'tokens'}, {'timestamp': 1754051827, 'value': 152.0, 'unit': 'tokens'}, {'timestamp': 1754051834, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754051835, 'value': 55.0, 'unit': 'tokens'}, {'timestamp': 1754051837, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754051845, 'value': 102.0, 'unit': 'tokens'}, {'timestamp': 1754099929, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754510050, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754510052, 'value': 160.0, 'unit': 'tokens'}, {'timestamp': 1754510064, 'value': 725.0, 'unit': 'tokens'}, {'timestamp': 1754510065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510067, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510083, 'value': 535.0, 'unit': 'tokens'}, {'timestamp': 1754596582, 'value': 36.0, 'unit': 'tokens'}])] ``` adding tests for each currently documented metric in llama stack using this new function. attached is also some manual testing integrations tests passing locally with replay mode and the linked client changes: <img width="1907" height="529" alt="Screenshot 2025-08-08 at 2 49 14 PM" src="https://github.com/user-attachments/assets/d482ab06-dcff-4f0c-a1f1-f870670ee9bc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> | ||
|  | 3d119a86d4 | chore: indicate to mypy that InferenceProvider.batch_completion/batch_chat_completion is concrete (#3239) # What does this PR do? closes https://github.com/llamastack/llama-stack/issues/3236 mypy considered our default implementations (raise NotImplementedError) to be trivial. the result was we implemented the same stubs in providers. this change puts enough into the default impls so mypy considers them non-trivial. this allows us to remove the duplicate implementations. | ||
|  | c3b2b06974 | refactor(logging): rename llama_stack logger categories (#3065) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR renames categories of llama_stack loggers. This PR aligns logging categories as per the package name, as well as reviews from initial https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up to #3061. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Replaces https://github.com/meta-llama/llama-stack/pull/2868 Part of https://github.com/meta-llama/llama-stack/issues/2865 cc @leseb @rhuss Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 00a67da449 | fix: Use pool_pre_ping=Truein SQLAlchemy engine creation (#3208)# What does this PR do? We noticed that when llama-stack is running for a long time, we would run into database errors when trying to run messages through the agent (which we configured to persist against postgres), seemingly due to the database connections being stale or disconnected. This commit adds `pool_pre_ping=True` to the SQLAlchemy engine creation to help mitigate this issue by checking the connection before using it, and re-establishing it if necessary. More information in: https://docs.sqlalchemy.org/en/20/core/pooling.html#dealing-with-disconnects We're also open to other suggestions on how to handle this issue, this PR is just a suggestion. ## Test Plan We have not tested it yet (we're in the process of doing that) and we're hoping it's going to resolve our issue. | ||
|  | 3f8df167f3 | chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061) # What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu> | ||
|  | 27d6becfd0 | fix(misc): pin openai dependency to < 1.100.0 (#3192) This OpenAI client release | ||
|  | 739b18edf8 | feat: add support for postgres ssl mode and root cert (#3182) this PR adds support for configuring `sslmode` and `sslrootcert` when initiating the psycopg2 connection. closes #3181 | ||
|  | c15cc7ed77 | fix: use ChatCompletionMessageFunctionToolCall (#3142) The OpenAI compatibility layer was incorrectly importing ChatCompletionMessageToolCallParam instead of the ChatCompletionMessageFunctionToolCall class. This caused "Cannot instantiate typing.Union" errors when processing agent requests with tool calls. Closes: #3141 Signed-off-by: Derek Higgins <derekh@redhat.com> | ||
|  | 3d90117891 | chore(tests): fix responses and vector_io tests (#3119) Some fixes to MCP tests. And a bunch of fixes for Vector providers. I also enabled a bunch of Vector IO tests to be used with `LlamaStackLibraryClient` ## Test Plan Run Responses tests with llama stack library client: ``` pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` Do the same with `-k openai_client` The rest should be taken care of by CI. | ||
|  | b70e2f1f09 | fix(dep): update to openai >= 1.99.6 and use new Function location (#3087) # What does this PR do? closes #3072 ## Test Plan ci | ||
|  | 0b5a794c27 | fix: telemetry logger spams when queue is full (#3070) # What does this PR do? ## Test Plan Ran a stress test on chat completion endpoint locally: For 10 concurrent users over 3 minutes: Before: <img width="1440" height="201" alt="image" src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6" /> After: <img width="1434" height="204" alt="image" src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866" /> (Will send scripts in a future PR.) | ||
|  | e3928e6a29 | feat: Implement hybrid search in Milvus (#2644) 
		
			Some checks failed
		
		
	 Integration Tests (Replay) / discover-tests (push) Successful in 5s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Python Package Build Test / build (3.13) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Python Package Build Test / build (3.12) (push) Failing after 10s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Unit Tests / unit-tests (3.13) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s Test External API and Providers / test-external (venv) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Pre-commit / pre-commit (push) Successful in 57s # What does this PR do?
This PR implements hybrid search for Milvus DB based on the inbuilt
milvus support.
   
    To test:
    ```
pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s
--tb=long --disable-warnings --asyncio-mode=auto
    ```
Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> | ||
|  | 0caef40e0d | fix: telemetry fixes (inference and core telemetry) (#2733) # What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> | ||
|  | 7f834339ba | chore(misc): make tests and starter faster (#3042) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Python Package Build Test / build (3.12) (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Test Llama Stack Build / generate-matrix (push) Successful in 11s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Test External API and Providers / test-external (venv) (push) Failing after 14s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Unit Tests / unit-tests (3.13) (push) Failing after 14s Test Llama Stack Build / build-single-provider (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Unit Tests / unit-tests (3.12) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Test Llama Stack Build / build (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Python Package Build Test / build (3.13) (push) Failing after 53s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s Pre-commit / pre-commit (push) Successful in 1m53s A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client` | ||
|  | e5b542dd8e | feat: switch to async completion in LiteLLM OpenAI mixin (#3029) 
		
			Some checks failed
		
		
	 Integration Tests (Replay) / discover-tests (push) Successful in 3s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s Unit Tests / unit-tests (3.12) (push) Failing after 11s Python Package Build Test / build (3.13) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Python Package Build Test / build (3.12) (push) Failing after 17s Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Test External API and Providers / test-external (venv) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s Unit Tests / unit-tests (3.13) (push) Failing after 25s Pre-commit / pre-commit (push) Successful in 1m10s | ||
|  | 3c2aee610d | refactor: Remove double filtering based on score threshold (#3019) # What does this PR do? Remove score_threshold based check from `OpenAIVectorStoreMixin` Closes: https://github.com/meta-llama/llama-stack/issues/3018 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> | ||
|  | 140ee7d337 | fix: sambanova inference provider (#2996) 
		
			Some checks failed
		
		
	 Integration Tests (Replay) / discover-tests (push) Successful in 3s Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Python Package Build Test / build (3.13) (push) Failing after 8s Unit Tests / unit-tests (3.12) (push) Failing after 8s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Python Package Build Test / build (3.12) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Test External API and Providers / test-external (venv) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Unit Tests / unit-tests (3.13) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Pre-commit / pre-commit (push) Successful in 1m29s # What does this PR do? closes #2995 update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin ## Test Plan ``` $ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct ... ======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ======================== ``` | ||
|  | 33cca26154 | chore: Enabling Integration tests for Weaviate (#2882) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 218c89fff1 | feat: Add clear error message when API key is missing (#2992) # What does this PR do? Improve user experience by providing specific guidance when no API key is available, showing both provider data header and config options with the correct field name for each provider. Also adds comprehensive test coverage for API key resolution scenarios. addresses #2990 for providers using litellm openai mixin ## Test Plan `./scripts/unit-tests.sh tests/unit/providers/inference/test_litellm_openai_mixin.py` | ||
|  | 2665f00102 | chore(rename): move llama_stack.distribution to llama_stack.core (#2975) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb | ||
|  | cd5c6a2fcd | chore: standardize vector store not found error (#2968) # What does this PR do? 1. Creates a new `VectorStoreNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 47c078fcef | feat: implement dynamic model detection support for inference providers using litellm (#2886) # What does this PR do? This enhancement allows inference providers using LiteLLMOpenAIMixin to validate model availability against LiteLLM's official provider model listings, improving reliability and user experience when working with different AI service providers. - Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor - Add check_model_availability method to LiteLLMOpenAIMixin using litellm.models_by_provider - Update Gemini, Groq, and SambaNova inference adapters to pass litellm_provider_name ## Test Plan standard CI. | ||
|  | 9583f468f8 | feat(starter)!: simplify starter distro; litellm model registry changes (#2916) | ||
|  | 52201612de | feat: implement chunk deletion for vector stores (#2701) Add support for deleting individual chunks from vector stores - Add abstract remove_chunk() method to EmbeddingIndex base class - Implement chunk deletion for Faiss provider, SQLite Vec, Milvus, PGVector - Placeholder implementations with NotImplementedError for Chroma/Qdrant/Weaviate - Integrate chunk deletion into OpenAI vector store file deletion flow - removed xfail from test_openai_vector_store_delete_file_removes_from_vector_store Closes: #2477 --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> | ||
|  | 7cc4819e90 | feat: add MCP Streamable HTTP support (#2554) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds support for the new Streamable HTTP transport for MCP, as well as falling back to the SSE protocol if the Streamable HTTP connection fails. <!-- If resolving an issue, uncomment and update the line below --> Closes #2542 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Signed-off-by: Calum Murray <cmurray@redhat.com> | ||
|  | 1463b79218 | feat(registry): make the Stack query providers for model listing (#2862) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider. | ||
|  | 2aba2c1236 | chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863) # What does this PR do? Moving vector store and vector store files helper methods to `openai_vector_store_mixin.py` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan The tests are already supported in the CI and tests the inline providers and current integration tests. Note that the `vector_index` fixture will be test `milvus_vec_adapter`, `faiss_vec_adapter`, and `sqlite_vec_adapter` in `tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`. Additionally, the integration tests in `integration-vector-io-tests.yml` runs `tests/integration/vector_io` tests for the following providers: ```python vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] ``` Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | e1ed152779 | chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835) 
		
			Some checks failed
		
		
	 Coverage Badge / unit-tests (push) Failing after 3s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Integration Tests / discover-tests (push) Successful in 7s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Python Package Build Test / build (3.13) (push) Failing after 2s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Unit Tests / unit-tests (3.13) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Integration Tests / test-matrix (push) Failing after 18s Pre-commit / pre-commit (push) Successful in 1m14s # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests | ||
|  | 3b83032555 | feat(registry): more flexible model lookup (#2859) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests | ||
|  | 9736f096f6 | chore(test): fix flaky telemetry tests (#2815) 
		
			Some checks failed
		
		
	 Installer CI / lint (push) Failing after 2s Installer CI / smoke-test (push) Has been skipped Integration Tests / discover-tests (push) Successful in 3s Coverage Badge / unit-tests (push) Failing after 6s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Python Package Build Test / build (3.12) (push) Failing after 3s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Unit Tests / unit-tests (3.12) (push) Failing after 6s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Test Llama Stack Build / generate-matrix (push) Successful in 11s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 9s Integration Tests / test-matrix (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Test External Providers / test-external-providers (venv) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Test Llama Stack Build / build (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 48s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s Unit Tests / unit-tests (3.13) (push) Failing after 52s Pre-commit / pre-commit (push) Successful in 1m42s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes flaky telemetry tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> See https://github.com/meta-llama/llama-stack/pull/2814 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> | ||
|  | 20c3197952 | chore: Making name optional in openai_create_vector_store (#2858) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | b57db11bed | feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745) 
		
			Some checks failed
		
		
	 Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Python Package Build Test / build (3.13) (push) Failing after 2s Test Llama Stack Build / generate-matrix (push) Successful in 6s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test Llama Stack Build / build-single-provider (push) Failing after 7s Integration Tests / discover-tests (push) Successful in 13s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Integration Tests / test-matrix (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Test External Providers / test-external-providers (venv) (push) Failing after 17s Test Llama Stack Build / build (push) Failing after 14s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s Python Package Build Test / build (3.12) (push) Failing after 51s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s Unit Tests / unit-tests (3.13) (push) Failing after 53s Pre-commit / pre-commit (push) Successful in 1m42s # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models | ||
|  | 31b088978a | fix: Fix /vector-stores/createAPI when vector store with duplicatename(#2617)# What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 33f0d83ad3 | chore: Move vector store kvstoreimplementation intoopenai_vector_store_mixin.py(#2748) | ||
|  | f731f369a2 | feat: add infrastructure to allow inference model discovery (#2710) # What does this PR do?
inference providers each have a static list of supported / known models.
some also have access to a dynamic list of currently available models.
this change gives prodivers using the ModelRegistryHelper the ability to
combine their static and dynamic lists.
for instance, OpenAIInferenceAdapter can implement
```
   def query_available_models(self) -> list[str]:
      return [entry.model for entry in self.openai_client.models.list()]
```
to augment its static list w/ a current list from openai.
## Test Plan
scripts/unit-test.sh | ||
|  | d880c2df0e | fix: auth sql store: user is owner policy (#2674) 
		
			Some checks failed
		
		
	 Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Installer CI / lint (push) Failing after 4s Installer CI / smoke-test (push) Has been skipped Integration Tests / discover-tests (push) Successful in 5s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 4s Python Package Build Test / build (3.12) (push) Failing after 7s Python Package Build Test / build (3.13) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Test Llama Stack Build / generate-matrix (push) Successful in 10s Test External Providers / test-external-providers (venv) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Unit Tests / unit-tests (3.13) (push) Failing after 8s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 13s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s Update ReadTheDocs / update-readthedocs (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 13s Test Llama Stack Build / build-single-provider (push) Failing after 13s Integration Tests / test-matrix (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s Unit Tests / unit-tests (3.12) (push) Failing after 13s Test Llama Stack Build / build (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s Pre-commit / pre-commit (push) Successful in 1m8s # What does this PR do? The current authorized sql store implementation does not respect user.principal (only checks attributes). This PR addresses that. ## Test Plan Added test cases to integration tests. | ||
|  | bbe0199bb7 | chore: update pre-commit hook versions (#2708) While investigating the `uv.lock` changes made in https://github.com/meta-llama/llama-stack/pull/2695 I noticed several of the pre-commit hook versions were out of date This PR updates them and fixes some new `ruff` errors --------- Signed-off-by: Nathan Weinberg <nweinber@redhat.com> | ||
|  | 9b7eecebcf | ci: test safety with starter (#2628) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 11s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, safety) (push) Failing after 25s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 27s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 9s Test Llama Stack Build / generate-matrix (push) Successful in 14s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s Test Llama Stack Build / build-single-provider (push) Failing after 14s Integration Tests / test-matrix (server, 3.12, tool_runtime) (push) Failing after 1m7s Update ReadTheDocs / update-readthedocs (push) Failing after 12s Unit Tests / unit-tests (3.13) (push) Failing after 14s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s Test External Providers / test-external-providers (venv) (push) Failing after 17s Test Llama Stack Build / build (push) Failing after 13s Unit Tests / unit-tests (3.12) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 35s Python Package Build Test / build (3.12) (push) Failing after 31s Python Package Build Test / build (3.13) (push) Failing after 29s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 34s Pre-commit / pre-commit (push) Successful in 1m24s # What does this PR do? We are now testing the safety capability with the starter image. This includes a few changes: * Enable the safety integration test * Relax the shield model requirements from llama-guard to make it work with llama-guard3:8b coming from Ollama * Expose a shield for each inference provider in the starter distro. The shield will only be registered if the provider is enabled. Closes: https://github.com/meta-llama/llama-stack/issues/2528 Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | e9926564bd | fix: authorized sql store with postgres (#2641) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 13s Integration Tests / test-matrix (server, 3.13, agents) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 11s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 13s Integration Tests / test-matrix (server, 3.12, vector_io) (push) Failing after 14s Integration Tests / test-matrix (server, 3.12, post_training) (push) Failing after 14s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 27s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 5s Test Llama Stack Build / generate-matrix (push) Successful in 5s Python Package Build Test / build (3.12) (push) Failing after 1s Test External Providers / test-external-providers (venv) (push) Failing after 3s Python Package Build Test / build (3.13) (push) Failing after 3s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test Llama Stack Build / build (push) Failing after 4s Unit Tests / unit-tests (3.12) (push) Failing after 4s Unit Tests / unit-tests (3.13) (push) Failing after 7s Test Llama Stack Build / build-single-provider (push) Failing after 44s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 41s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 43s Pre-commit / pre-commit (push) Successful in 1m34s # What does this PR do? postgres has different json extract syntax from sqlite ## Test Plan added integration test | ||
|  | f77d4d91f5 | fix: handle encoding errors when adding files to vector store (#2574) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (server, 3.13, datasets) (push) Failing after 12s Integration Tests / test-matrix (server, 3.13, inference) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, inspect) (push) Failing after 8s Integration Tests / test-matrix (server, 3.13, post_training) (push) Failing after 7s Integration Tests / test-matrix (server, 3.13, scoring) (push) Failing after 6s Integration Tests / test-matrix (server, 3.13, providers) (push) Failing after 9s Integration Tests / test-matrix (server, 3.13, vector_io) (push) Failing after 6s Integration Tests / test-matrix (server, 3.13, tool_runtime) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Test Llama Stack Build / generate-matrix (push) Successful in 5s Python Package Build Test / build (3.13) (push) Failing after 1s Python Package Build Test / build (3.12) (push) Failing after 1s Update ReadTheDocs / update-readthedocs (push) Failing after 3s Test External Providers / test-external-providers (venv) (push) Failing after 6s Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Test Llama Stack Build / build (push) Failing after 5s Unit Tests / unit-tests (3.12) (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 45s Test Llama Stack Build / build-single-provider (push) Failing after 37s Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 43s Pre-commit / pre-commit (push) Successful in 1m35s - Add try-catch block around data.decode() to handle UnicodeDecodeError - Implement UTF-8 fallback when detected encoding fails - Return empty string when both encodings fail - add unit tests Fixes #2572: UnicodeDecodeError when uploading files with problematic encodings Signed-off-by: Derek Higgins <derekh@redhat.com> | ||
|  | ea80ea63ac | chore: Updating chunk id generation to ensure uniqueness (#2618) # What does this PR do? This handles an edge case for `generate_chunk_id` if the concatenation of the `document_id` and `chunk_text` combination are not unique. Adding the window location ensures uniqueness. ## Test Plan Added unit test Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 3c43a2f529 | fix: store configs (#2593) # What does this PR do? https://github.com/meta-llama/llama-stack/pull/2490 broke postgres_demo, as the config expected a str but the value was converted to int. This PR: 1. Updates the type of port in sqlstore to be int 2. template generation uses `dict` instead of `StackRunConfig` so as to avoid failing pydantic typechecks. 3. Adds `replace_env_vars` to StackRunConfig instantiation in `configure.py` (not sure why this wasn't needed before). ## Test Plan `llama stack build --template postgres_demo --image-type conda --run` | ||
|  | 4d0d2d685f | fix: Set parameter usedforsecurity=False when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters (#2577) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 10s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 5s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 5s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 18s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 26s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 25s Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 24s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 26s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s Python Package Build Test / build (3.12) (push) Failing after 1s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 24s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 31s Unit Tests / unit-tests (3.12) (push) Failing after 5s Test External Providers / test-external-providers (venv) (push) Failing after 5s Unit Tests / unit-tests (3.13) (push) Failing after 4s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 34s Python Package Build Test / build (3.13) (push) Failing after 33s Pre-commit / pre-commit (push) Successful in 1m52s # What does this PR do? Set parameter `usedforsecurity=False` when calling hashlib.md5 in order to fix rag_tool.insert on FIPS clusters <!-- If resolving an issue, uncomment and update the line below --> Closes #2571 --------- Signed-off-by: Jorge Garcia Oncins <jgarciao@redhat.com> | ||
|  | b333a3c03a | fix(ollama): Download remote image URLs for Ollama (#2551) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (http, 3.13, post_training) (push) Failing after 16s Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 19s Integration Tests / test-matrix (http, 3.13, vector_io) (push) Failing after 15s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 13s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 10s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 8s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 46s Python Package Build Test / build (3.12) (push) Failing after 43s Test External Providers / test-external-providers (venv) (push) Failing after 40s Python Package Build Test / build (3.13) (push) Failing after 42s Unit Tests / unit-tests (3.13) (push) Failing after 22s Unit Tests / unit-tests (3.12) (push) Failing after 25s Update ReadTheDocs / update-readthedocs (push) Failing after 20s Pre-commit / pre-commit (push) Successful in 2m13s ## What does this PR do? Ollama does not support remote images. Only local file paths OR base64 inputs are supported. This PR ensures that the Stack downloads remote images and passes the base64 down to the inference engine. ## Test Plan Added a test cases for Responses and ran it for both `fireworks` and `ollama` providers. | ||
|  | 8d8e90d78e | fix: add missing argument and methods (#2550) # What does this PR do? Resolves: ``` mypy.....................................................................Failed - hook id: mypy - exit code: 1 llama_stack/providers/utils/responses/responses_store.py:119: error: Missing positional argument "policy" in call to "fetch_one" of "AuthorizedSqlStore" [call-arg] llama_stack/providers/utils/responses/responses_store.py:122: error: "AuthorizedSqlStore" has no attribute "delete" [attr-defined] Found 2 errors in 1 file (checked 403 source files) ``` Signed-off-by: Sébastien Han <seb@redhat.com> | ||
|  | be9bf68246 | feat: Add webmethod for deleting openai responses (#2160) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.13, datasets) (push) Failing after 16s Integration Tests / test-matrix (http, 3.13, datasets) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 12s Integration Tests / test-matrix (http, 3.13, scoring) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 9s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 11s Integration Tests / test-matrix (http, 3.12, providers) (push) Failing after 17s Integration Tests / test-matrix (http, 3.13, agents) (push) Failing after 11s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 16s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 21s Test External Providers / test-external-providers (venv) (push) Failing after 9s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 19s Unit Tests / unit-tests (3.12) (push) Failing after 9s Update ReadTheDocs / update-readthedocs (push) Failing after 7s Unit Tests / unit-tests (3.13) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 39s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 37s Python Package Build Test / build (3.13) (push) Failing after 33s Python Package Build Test / build (3.12) (push) Failing after 36s Pre-commit / pre-commit (push) Failing after 1m19s # What does this PR do? This PR creates a webmethod for deleting open AI responses, adds and implementation for it and makes an integration test for the OpenAI delete response method. [//]: # (If resolving an issue, uncomment and update the line below) # (Closes #2077) ## Test Plan Ran the standard tests and the pre-commit hooks and the unit tests. # (## Documentation) For this pr I made the routes and implementation based on the current get and create methods. The unit tests were not able to handle this test due to the mock interface in use, which did not allow for effective CRUD to be tested. I instead created an integration test to match the existing ones in the test_openai_responses. | ||
|  | cc19b56c87 | chore: OpenAI compatibility for Milvus (#2470) # What does this PR do? Closes https://github.com/meta-llama/llama-stack/issues/2461 ## Test Plan Tested with the `ollama` distriubtion template and updated the vector_io provider to: ```yaml vector_io: - provider_id: milvus provider_type: inline::milvus config: db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ollama}/milvus_store.db kvstore: type: sqlite db_name: milvus_registry.db ``` Ran the stack ```bash llama stack run ./llama_stack/templates/ollama/run.yaml --image-type venv --env OLLAMA_URL="http://0.0.0.0:11434" ``` Ran the tests: ``` pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2 ``` Output passed. Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> | ||
|  | 03e61e3fcc | fix: ValueError in faiss vector database serialization (resolves #2519) (#2526) 
		
			Some checks failed
		
		
	 Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 13s Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 6s Integration Tests / test-matrix (library, 3.13, inference) (push) Failing after 8s Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 7s Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 11s Integration Tests / test-matrix (library, 3.13, inspect) (push) Failing after 7s Integration Tests / test-matrix (http, 3.13, tool_runtime) (push) Failing after 22s Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 8s Integration Tests / test-matrix (http, 3.12, datasets) (push) Failing after 22s Integration Tests / test-matrix (http, 3.13, inference) (push) Failing after 23s Integration Tests / test-matrix (library, 3.13, agents) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, providers) (push) Failing after 8s Integration Tests / test-matrix (library, 3.13, post_training) (push) Failing after 7s Integration Tests / test-matrix (library, 3.13, scoring) (push) Failing after 13s Integration Tests / test-matrix (library, 3.13, vector_io) (push) Failing after 12s Integration Tests / test-matrix (library, 3.13, tool_runtime) (push) Failing after 14s Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 12s Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 10s Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 6s Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 5s Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s Python Package Build Test / build (3.12) (push) Failing after 15s Python Package Build Test / build (3.13) (push) Failing after 17s Test External Providers / test-external-providers (venv) (push) Failing after 20s Unit Tests / unit-tests (3.12) (push) Failing after 21s Unit Tests / unit-tests (3.13) (push) Failing after 11s Pre-commit / pre-commit (push) Successful in 1m12s The error message was misleading as it appeared to be an Ollama connectivity issue, but actually occurred during faiss vector database initialization. ## 🔍 Root Cause Analysis The issue was in the faiss vector database serialization logic in `llama_stack/providers/inline/vector_io/faiss/faiss.py`: 1. **Saving**: `faiss.serialize_index()` returns binary data (uint8 numpy array) 2. **Bug**: Code incorrectly used `np.savetxt()` which converts binary to text with scientific notation (e.g., `7.300000000000000000e+01`) 3. **Loading**: `np.loadtxt(buffer, dtype=np.uint8)` failed to parse scientific notation back to uint8 4. **Result**: Server crashed during initialization before reaching Ollama connectivity check ## ✅ Solution Replaced text-based serialization with proper binary serialization: ``` **After (fixed):** ```python # Saving - proper binary format np.save(buffer, np_index, allow_pickle=False) # Loading - proper binary format self.index = faiss.deserialize_index(np.load(buffer, allow_pickle=False)) ``` ## 🧪 Testing - ✅ Binary serialization/deserialization works correctly - ✅ Backward compatible with existing functionality - ✅ No security concerns (allow_pickle=False maintained) - ✅ Resolves the specific ValueError mentioned in the issue ## 📊 Impact This fix resolves: - ValueError during server startup with Ollama templates ## 🔗 Related Issues - Closes #2519 - Affects all users of Ollama template and faiss vector_io configurations ## 📝 Files Changed - `llama_stack/providers/inline/vector_io/faiss/faiss.py` - Fixed serialization methods in `initialize()` and `_save_index()` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Ben Browning <bbrownin@redhat.com> |