llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 18:27:22 +00:00

Author	SHA1	Message	Date
Eric Huang	a772f0a42d	chore: introduce write queue for response_store # What does this PR do? ## Test Plan	2025-09-21 20:46:36 -07:00
ehhuang	f44eb935c4	chore: simplify authorized sqlstore (#3496 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 35s Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Pre-commit / pre-commit (push) Successful in 1m19s Details # What does this PR do? This PR is generated with AI and reviewed by me. Refactors the AuthorizedSqlStore class to store the access policy as an instance variable rather than passing it as a parameter to each method call. This simplifies the API. # Test Plan existing tests	2025-09-19 16:13:56 -07:00
Matthew Farrellee	521865c388	feat: include all models from provider's /v1/models (#3471 ) # What does this PR do? this replaces the static model listing for any provider using OpenAIMixin currently - - anthropic - azure openai - gemini - groq - llama-api - nvidia - openai - sambanova - tgi - vertexai - vllm - not changed: together has its own impl ## Test Plan - new unit tests - manual for llama-api, openai, groq, gemini ``` for provider in llama-openai-compat openai groq gemini; do uv run llama stack build --image-type venv --providers inference=remote::provider --run & uv run --with llama-stack-client llama-stack-client models list \| grep Total ``` results (17 sep 2025): - llama-api: 4 - openai: 86 - groq: 21 - gemini: 66 closes #3467	2025-09-18 05:17:11 -04:00
Matthew Farrellee	f4ab154ade	feat: add dynamic model registration support to TGI inference (#3417 ) Some checks failed Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 43s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Pre-commit / pre-commit (push) Successful in 1m21s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 5s Details # What does this PR do? adds dynamic model support to TGI add new overwrite_completion_id feature to OpenAIMixin to deal with TGI always returning id="" ## Test Plan tgi: `docker run --gpus all --shm-size 1g -p 8080:80 -v /data:/data ghcr.io/huggingface/text-generation-inference --model-id Qwen/Qwen3-0.6B` stack: `TGI_URL=http://localhost:8080 uv run llama stack build --image-type venv --distro ci-tests --run` test: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --setup tgi --subdirs inference --pattern openai`	2025-09-15 15:52:40 -04:00
Doug Edgar	f67081d2d6	feat: migrate to FIPS-validated cryptographic algorithms (#3423 ) Some checks failed Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details API Conformance Tests / check-schema-compatibility (push) Successful in 6s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 6s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 16s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Vector IO Integration Tests / test-matrix (push) Failing after 19s Details UI Tests / ui-tests (22) (push) Successful in 33s Details Pre-commit / pre-commit (push) Successful in 1m13s Details # What does this PR do? Migrates MD5 and SHA-1 hash algorithms to SHA-256. In particular, replaces: - MD5 in chunk ID generation. - MD5 in file verification. - SHA-1 in model identifier digests. And updates all related test expectations. Original discussion: https://github.com/llamastack/llama-stack/discussions/3413 <!-- If resolving an issue, uncomment and update the line below --> Closes #3424. ## Test Plan Unit tests from scripts/unit-tests.sh were updated to match the new hash output, and ran to verify the tests pass. Signed-off-by: Doug Edgar <dedgar@redhat.com>	2025-09-12 11:18:19 +02:00
Matthew Farrellee	8ef1189be7	chore: update the vLLM inference impl to use OpenAIMixin for openai-compat functions (#3404 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test Llama Stack Build / build-single-provider (push) Failing after 5s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 6s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details UI Tests / ui-tests (22) (push) Successful in 31s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? update vLLM inference provider to use OpenAIMixin for openai-compat functions inference recordings from Qwen3-0.6B and vLLM 0.8.3 - ``` docker run --gpus all -v ~/.cache/huggingface:/root/.cache/huggingface -p 8000:8000 --ipc=host \ vllm/vllm-openai:latest \ --model Qwen/Qwen3-0.6B --enable-auto-tool-choice --tool-call-parser hermes ``` ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests --setup vllm --subdirs inference ```	2025-09-11 09:04:38 -04:00
Ashwin Bharambe	0c7f49490c	fix(inference_store): on duplicate chat completion IDs, replace (#3408 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details Update ReadTheDocs / update-readthedocs (push) Failing after 23s Details Test External API and Providers / test-external (venv) (push) Failing after 30s Details UI Tests / ui-tests (22) (push) Successful in 35s Details Pre-commit / pre-commit (push) Successful in 1m45s Details # What does this PR do? Duplicate chat completion IDs can be generated during tests especially if they are replaying recorded responses across different tests. No need to warn or error under those circumstances. In the wild, this is not likely to happen at all (no evidence) so we aren't really hiding any problem.	2025-09-10 14:34:18 -07:00
ehhuang	e980436a2e	chore: introduce write queue for inference_store (#3383 ) # What does this PR do? Adds a write worker queue for writes to inference store. This avoids overwhelming request processing with slow inference writes. ## Test Plan Benchmark: ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` ## RPS from 21 -> 57	2025-09-10 11:57:42 -07:00
ehhuang	f6bf36343d	chore: logging perf improvments (#3393 ) # What does this PR do? - Use BackgroundLogger when logging metric events. - Reuse event loop in BackgroundLogger ## Test Plan ``` cd /docs/source/distributions/k8s-benchmark # start mock server python openai-mock-server.py --port 8000 # start stack server LLAMA_STACK_LOGGING="all=WARNING" uv run --with llama-stack python -m llama_stack.core.server.server docs/source/distributions/k8s-benchmark/stack_run_config.yaml # run benchmark script uv run python3 benchmark.py --duration 120 --concurrent 50 --base-url=http://localhost:8321/v1/openai/v1 --model=vllm-inference/meta-llama/Llama-3.2-3B-Instruct ``` ### RPS from 57 -> 62	2025-09-10 11:52:23 -07:00
Sumanth Kamenani	0b00c68d59	fix: use lambda pattern for bedrock config env vars (#3307 ) # What does this PR do? Improved bedrock provider config to read from environment variables like AWS_ACCESS_KEY_ID. Updated all fields to use default_factory with lambda patterns like the nvidia provider does. Now the environment variables work as documented. Closes #3305 ## Test Plan Ran the new bedrock config tests: ```bash python -m pytest tests/unit/providers/inference/bedrock/test_config.py -v Verified existing provider tests still work: python -m pytest tests/unit/providers/test_configs.py -v	2025-09-05 10:45:11 +02:00
Sumanth Kamenani	55a8c5f439	fix: show descriptive MCP server connection errors instead of generic 500s (#3256 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 1m20s Details Pre-commit / pre-commit (push) Successful in 2m37s Details What does this PR do? Fixes error handling when MCP server connections fail. Instead of returning generic 500 errors, now provides descriptive error messages with proper HTTP status codes. Closes #3107 Test Plan Before fix: curl -X GET "http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server" Returns: {"detail": "Internal server error: An unexpected error occurred."} (500) After fix: curl -X GET "http://localhost:8321/v1/tool-runtime/list-tools?tool_group_id=bad-mcp-server" Returns: {"error": {"detail": "Failed to connect to MCP server at http://localhost:9999/sse: Connection refused"}} (502) Tests: - Added unit test for ConnectionError → 502 translation - Manually tested with unreachable MCP servers (connection refused)	2025-09-04 13:25:02 -07:00
Derek Higgins	5bbca56cfc	fix: Make SentenceTransformer embedding operations non-blocking (#3335 ) - Wrap model loading with asyncio.to_thread() to prevent blocking during model download/initialization - Wrap encoding operations with asyncio.to_thread() to run in background thread - Convert _load_sentence_transformer_model() to async method This ensures the async event loop remains responsive during embedding operations. Closes: #3332 Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-09-04 13:58:41 -04:00
Matthew Farrellee	478b4ff1e6	chore(migrate apis): move VectorDBWithIndex from embeddings to openai_embeddings (#3294 ) # What does this PR do? migrates VectorDBWithIndex to use openai_embeddings part of #2365 ## Test Plan existing unit tests	2025-08-31 14:48:35 -07:00
IAN MILLER	3130ca0a78	feat: implement keyword, vector and hybrid search inside vector stores for PGVector provider (#3064 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to implement `openai/v1/vector_stores/{vector_store_id}/search` for PGVector provider. It involves implementing vector similarity search, keyword search and hybrid search for `PGVectorIndex`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3006 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Run unit tests: ` ./scripts/unit-tests.sh ` Run integration tests for openai vector stores: 1. Export env vars: ``` export ENABLE_PGVECTOR=true export PGVECTOR_HOST=localhost export PGVECTOR_PORT=5432 export PGVECTOR_DB=llamastack export PGVECTOR_USER=llamastack export PGVECTOR_PASSWORD=llamastack ``` 2. Create DB: ``` psql -h localhost -U postgres -c "CREATE ROLE llamastack LOGIN PASSWORD 'llamastack';" psql -h localhost -U postgres -c "CREATE DATABASE llamastack OWNER llamastack;" psql -h localhost -U llamastack -d llamastack -c "CREATE EXTENSION IF NOT EXISTS vector;" ``` 3. Install sentence-transformers: ` uv pip install sentence-transformers ` 4. Run: ``` uv run --group test pytest -s -v --stack-config="inference=inline::sentence-transformers,vector_io=remote::pgvector" --embedding-model sentence-transformers/all-MiniLM-L6-v2 tests/integration/vector_io/test_openai_vector_stores.py ``` Inspect PGVector vector stores (optional): ``` psql llamastack psql (14.18 (Homebrew)) Type "help" for help. llamastack=# \z Access privileges Schema \| Name \| Type \| Access privileges \| Column privileges \| Policies --------+------------------------------------------------------+-------+-------------------+-------------------+---------- public \| llamastack_kvstore \| table \| \| \| public \| metadata_store \| table \| \| \| public \| vector_store_pgvector_main \| table \| \| \| public \| vector_store_vs_1dfbc061_1f4d_4497_9165_ecba2622ba3a \| table \| \| \| public \| vector_store_vs_2085a9fb_1822_4e42_a277_c6a685843fa7 \| table \| \| \| public \| vector_store_vs_2b3dae46_38be_462a_afd6_37ee5fe661b1 \| table \| \| \| public \| vector_store_vs_2f438de6_f606_4561_9d50_ef9160eb9060 \| table \| \| \| public \| vector_store_vs_3eeca564_2580_4c68_bfea_83dc57e31214 \| table \| \| \| public \| vector_store_vs_53942163_05f3_40e0_83c0_0997c64613da \| table \| \| \| public \| vector_store_vs_545bac75_8950_4ff1_b084_e221192d4709 \| table \| \| \| public \| vector_store_vs_688a37d8_35b2_4298_a035_bfedf5b21f86 \| table \| \| \| public \| vector_store_vs_70624d9a_f6ac_4c42_b8ab_0649473c6600 \| table \| \| \| public \| vector_store_vs_73fc1dd2_e942_4972_afb1_1e177b591ac2 \| table \| \| \| public \| vector_store_vs_9d464949_d51f_49db_9f87_e033b8b84ac9 \| table \| \| \| public \| vector_store_vs_a1e4d724_5162_4d6d_a6c0_bdafaf6b76ec \| table \| \| \| public \| vector_store_vs_a328fb1b_1a21_480f_9624_ffaa60fb6672 \| table \| \| \| public \| vector_store_vs_a8981bf0_2e66_4445_a267_a8fff442db53 \| table \| \| \| public \| vector_store_vs_ccd4b6a4_1efd_4984_ad03_e7ff8eadb296 \| table \| \| \| public \| vector_store_vs_cd6420a4_a1fc_4cec_948c_1413a26281c9 \| table \| \| \| public \| vector_store_vs_cd709284_e5cf_4a88_aba5_dc76a35364bd \| table \| \| \| public \| vector_store_vs_d7a4548e_fbc1_44d7_b2ec_b664417f2a46 \| table \| \| \| public \| vector_store_vs_e7f73231_414c_4523_886c_d1174eee836e \| table \| \| \| public \| vector_store_vs_ffd53588_819f_47e8_bb9d_954af6f7833d \| table \| \| \| (23 rows) llamastack=# ``` Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-08-29 16:30:12 +02:00
Matthew Farrellee	ed418653ec	chore(dev): add inequality support to sqlstore where clause (#3272 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s Details Vector IO Integration Tests / test-matrix (push) Failing after 1s Details Pre-commit / pre-commit (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test External API and Providers / test-external (venv) (push) Failing after 1s Details UI Tests / ui-tests (22) (push) Failing after 0s Details Unit Tests / unit-tests (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 1s Details # What does this PR do? add the ability to use inequalities in the where clause of the sqlstore. this is infrastructure for files expiration. ## Test Plan unit tests	2025-08-28 14:49:36 -07:00
Charlie Doern	3b9278f254	feat: implement query_metrics (#3074 ) # What does this PR do? query_metrics currently has no implementation, meaning once a metric is emitted there is no way in llama stack to query it from the store. implement query_metrics for the meta_reference provider which follows a similar style to `query_traces`, using the trace_store to format an SQL query and execute it in this case the parameters for the query are `metric.METRIC_NAME, start_time, and end_time` and any other matchers if they are provided. this required client side changes since the client had no `query_metrics` or any associated resources, so any tests here will fail but I will provide manual execution logs for the new tests I am adding order the metrics by timestamp. Additionally add `unit` to the `MetricDataPoint` class since this adds much more context to the metric being queried. depends on https://github.com/llamastack/llama-stack-client-python/pull/260 ## Test Plan ``` import time import uuid def create_http_client(): from llama_stack_client import LlamaStackClient return LlamaStackClient(base_url="http://localhost:8321") client = create_http_client() response = client.telemetry.query_metrics(metric_name="total_tokens", start_time=0) print(response) ``` ``` ╰─ python3.12 ~/telemetry.py INFO:httpx:HTTP Request: POST http://localhost:8322/v1/telemetry/metrics/total_tokens "HTTP/1.1 200 OK" [TelemetryQueryMetricsResponse(data=None, metric='total_tokens', labels=[], values=[{'timestamp': 1753999514, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999816, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999881, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1753999956, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000200, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754000419, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000714, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000876, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754000908, 'value': 34.0, 'unit': 'tokens'}, {'timestamp': 1754001309, 'value': 584.0, 'unit': 'tokens'}, {'timestamp': 1754001311, 'value': 138.0, 'unit': 'tokens'}, {'timestamp': 1754001316, 'value': 349.0, 'unit': 'tokens'}, {'timestamp': 1754001318, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001320, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001341, 'value': 923.0, 'unit': 'tokens'}, {'timestamp': 1754001350, 'value': 354.0, 'unit': 'tokens'}, {'timestamp': 1754001462, 'value': 417.0, 'unit': 'tokens'}, {'timestamp': 1754001464, 'value': 158.0, 'unit': 'tokens'}, {'timestamp': 1754001475, 'value': 697.0, 'unit': 'tokens'}, {'timestamp': 1754001477, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001479, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001489, 'value': 298.0, 'unit': 'tokens'}, {'timestamp': 1754001541, 'value': 615.0, 'unit': 'tokens'}, {'timestamp': 1754001543, 'value': 119.0, 'unit': 'tokens'}, {'timestamp': 1754001548, 'value': 310.0, 'unit': 'tokens'}, {'timestamp': 1754001549, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001551, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001568, 'value': 714.0, 'unit': 'tokens'}, {'timestamp': 1754001800, 'value': 437.0, 'unit': 'tokens'}, {'timestamp': 1754001802, 'value': 200.0, 'unit': 'tokens'}, {'timestamp': 1754001806, 'value': 262.0, 'unit': 'tokens'}, {'timestamp': 1754001808, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001810, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001816, 'value': 82.0, 'unit': 'tokens'}, {'timestamp': 1754001923, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754001929, 'value': 391.0, 'unit': 'tokens'}, {'timestamp': 1754001939, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754001941, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001942, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754001952, 'value': 252.0, 'unit': 'tokens'}, {'timestamp': 1754002053, 'value': 251.0, 'unit': 'tokens'}, {'timestamp': 1754002059, 'value': 375.0, 'unit': 'tokens'}, {'timestamp': 1754002062, 'value': 244.0, 'unit': 'tokens'}, {'timestamp': 1754002064, 'value': 111.0, 'unit': 'tokens'}, {'timestamp': 1754002065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002083, 'value': 719.0, 'unit': 'tokens'}, {'timestamp': 1754002302, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754002306, 'value': 218.0, 'unit': 'tokens'}, {'timestamp': 1754002308, 'value': 198.0, 'unit': 'tokens'}, {'timestamp': 1754002309, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754002311, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754002324, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003161, 'value': 69.0, 'unit': 'tokens'}, {'timestamp': 1754003169, 'value': 499.0, 'unit': 'tokens'}, {'timestamp': 1754003171, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003173, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003185, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003448, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003453, 'value': 422.0, 'unit': 'tokens'}, {'timestamp': 1754003589, 'value': 579.0, 'unit': 'tokens'}, {'timestamp': 1754003609, 'value': 279.0, 'unit': 'tokens'}, {'timestamp': 1754003614, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 303.0, 'unit': 'tokens'}, {'timestamp': 1754003706, 'value': 51.0, 'unit': 'tokens'}, {'timestamp': 1754003713, 'value': 426.0, 'unit': 'tokens'}, {'timestamp': 1754003714, 'value': 70.0, 'unit': 'tokens'}, {'timestamp': 1754003715, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754003724, 'value': 225.0, 'unit': 'tokens'}, {'timestamp': 1754004226, 'value': 516.0, 'unit': 'tokens'}, {'timestamp': 1754004228, 'value': 127.0, 'unit': 'tokens'}, {'timestamp': 1754004232, 'value': 281.0, 'unit': 'tokens'}, {'timestamp': 1754004234, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004236, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004244, 'value': 206.0, 'unit': 'tokens'}, {'timestamp': 1754004683, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004690, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 124.0, 'unit': 'tokens'}, {'timestamp': 1754004692, 'value': 65.0, 'unit': 'tokens'}, {'timestamp': 1754004694, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754004703, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754004743, 'value': 338.0, 'unit': 'tokens'}, {'timestamp': 1754004749, 'value': 211.0, 'unit': 'tokens'}, {'timestamp': 1754005566, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754006101, 'value': 159.0, 'unit': 'tokens'}, {'timestamp': 1754006105, 'value': 272.0, 'unit': 'tokens'}, {'timestamp': 1754006109, 'value': 308.0, 'unit': 'tokens'}, {'timestamp': 1754006110, 'value': 61.0, 'unit': 'tokens'}, {'timestamp': 1754006112, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754006130, 'value': 705.0, 'unit': 'tokens'}, {'timestamp': 1754051825, 'value': 454.0, 'unit': 'tokens'}, {'timestamp': 1754051827, 'value': 152.0, 'unit': 'tokens'}, {'timestamp': 1754051834, 'value': 481.0, 'unit': 'tokens'}, {'timestamp': 1754051835, 'value': 55.0, 'unit': 'tokens'}, {'timestamp': 1754051837, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754051845, 'value': 102.0, 'unit': 'tokens'}, {'timestamp': 1754099929, 'value': 36.0, 'unit': 'tokens'}, {'timestamp': 1754510050, 'value': 598.0, 'unit': 'tokens'}, {'timestamp': 1754510052, 'value': 160.0, 'unit': 'tokens'}, {'timestamp': 1754510064, 'value': 725.0, 'unit': 'tokens'}, {'timestamp': 1754510065, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510067, 'value': 133.0, 'unit': 'tokens'}, {'timestamp': 1754510083, 'value': 535.0, 'unit': 'tokens'}, {'timestamp': 1754596582, 'value': 36.0, 'unit': 'tokens'}])] ``` adding tests for each currently documented metric in llama stack using this new function. attached is also some manual testing integrations tests passing locally with replay mode and the linked client changes: <img width="1907" height="529" alt="Screenshot 2025-08-08 at 2 49 14 PM" src="https://github.com/user-attachments/assets/d482ab06-dcff-4f0c-a1f1-f870670ee9bc" /> --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-22 14:19:24 -07:00
Matthew Farrellee	3d119a86d4	chore: indicate to mypy that InferenceProvider.batch_completion/batch_chat_completion is concrete (#3239 ) # What does this PR do? closes https://github.com/llamastack/llama-stack/issues/3236 mypy considered our default implementations (raise NotImplementedError) to be trivial. the result was we implemented the same stubs in providers. this change puts enough into the default impls so mypy considers them non-trivial. this allows us to remove the duplicate implementations.	2025-08-22 14:17:30 -07:00
Mustafa Elbehery	c3b2b06974	refactor(logging): rename llama_stack logger categories (#3065 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR renames categories of llama_stack loggers. This PR aligns logging categories as per the package name, as well as reviews from initial https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up to #3061. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Replaces https://github.com/meta-llama/llama-stack/pull/2868 Part of https://github.com/meta-llama/llama-stack/issues/2865 cc @leseb @rhuss Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-21 17:31:04 -07:00
Omer Tuchfeld	00a67da449	fix: Use `pool_pre_ping=True` in SQLAlchemy engine creation (#3208 ) # What does this PR do? We noticed that when llama-stack is running for a long time, we would run into database errors when trying to run messages through the agent (which we configured to persist against postgres), seemingly due to the database connections being stale or disconnected. This commit adds `pool_pre_ping=True` to the SQLAlchemy engine creation to help mitigate this issue by checking the connection before using it, and re-establishing it if necessary. More information in: https://docs.sqlalchemy.org/en/20/core/pooling.html#dealing-with-disconnects We're also open to other suggestions on how to handle this issue, this PR is just a suggestion. ## Test Plan We have not tested it yet (we're in the process of doing that) and we're hoping it's going to resolve our issue.	2025-08-20 13:52:05 -07:00
Mustafa Elbehery	3f8df167f3	chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061 ) # What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>	2025-08-20 07:15:35 -04:00
Ashwin Bharambe	27d6becfd0	fix(misc): pin openai dependency to < 1.100.0 (#3192 ) This OpenAI client release `0843a11164` ends up breaking litellm `169a17400f/litellm/types/llms/openai.py (L40)` Update the dependency pin. Also make the imports a bit more defensive anyhow if something else during `llama stack build` ends up moving openai to a previous version. ## Test Plan Run pre-release script integration tests.	2025-08-18 12:20:50 -07:00
Maor Friedman	739b18edf8	feat: add support for postgres ssl mode and root cert (#3182 ) this PR adds support for configuring `sslmode` and `sslrootcert` when initiating the psycopg2 connection. closes #3181	2025-08-18 10:24:24 -07:00
Derek Higgins	c15cc7ed77	fix: use ChatCompletionMessageFunctionToolCall (#3142 ) The OpenAI compatibility layer was incorrectly importing ChatCompletionMessageToolCallParam instead of the ChatCompletionMessageFunctionToolCall class. This caused "Cannot instantiate typing.Union" errors when processing agent requests with tool calls. Closes: #3141 Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-08-14 10:27:00 -07:00
Ashwin Bharambe	3d90117891	chore(tests): fix responses and vector_io tests (#3119 ) Some fixes to MCP tests. And a bunch of fixes for Vector providers. I also enabled a bunch of Vector IO tests to be used with `LlamaStackLibraryClient` ## Test Plan Run Responses tests with llama stack library client: ``` pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` Do the same with `-k openai_client` The rest should be taken care of by CI.	2025-08-12 16:15:53 -07:00
Matthew Farrellee	b70e2f1f09	fix(dep): update to openai >= 1.99.6 and use new Function location (#3087 ) # What does this PR do? closes #3072 ## Test Plan ci	2025-08-12 08:40:32 -07:00
ehhuang	0b5a794c27	fix: telemetry logger spams when queue is full (#3070 ) # What does this PR do? ## Test Plan Ran a stress test on chat completion endpoint locally: For 10 concurrent users over 3 minutes: Before: <img width="1440" height="201" alt="image" src="https://github.com/user-attachments/assets/24e0d580-186e-4e24-931e-2b936c5859b6" /> After: <img width="1434" height="204" alt="image" src="https://github.com/user-attachments/assets/4b806d88-f822-41e9-b25a-018cc4bec866" /> (Will send scripts in a future PR.)	2025-08-08 13:47:36 -07:00
Varsha	e3928e6a29	feat: Implement hybrid search in Milvus (#2644 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 10s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s Details Test External API and Providers / test-external (venv) (push) Failing after 21s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s Details Pre-commit / pre-commit (push) Successful in 57s Details # What does this PR do? This PR implements hybrid search for Milvus DB based on the inbuilt milvus support. To test: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=long --disable-warnings --asyncio-mode=auto ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-08-07 09:42:03 +02:00
Charlie Doern	0caef40e0d	fix: telemetry fixes (inference and core telemetry) (#2733 ) # What does this PR do? I found a few issues while adding new metrics for various APIs: currently metrics are only propagated in `chat_completion` and `completion` since most providers use the `openai_..` routes as the default in `llama-stack-client inference chat-completion`, metrics are currently not working as expected. in order to get them working the following had to be done: 1. get the completion as usual 2. use new `openai_` versions of the metric gathering functions which use `.usage` from the `OpenAI..` response types to gather the metrics which are already populated. 3. define a `stream_generator` which counts the tokens and computes the metrics (only for stream=True) 5. add metrics to response NOTE: I could not add metrics to `openai_completion` where stream=True because that ONLY returns an `OpenAICompletion` not an AsyncGenerator that we can manipulate. acquire the lock, and add event to the span as the other `_log_...` methods do some new output: `llama-stack-client inference chat-completion --message hi` <img width="2416" height="425" alt="Screenshot 2025-07-16 at 8 28 20 AM" src="https://github.com/user-attachments/assets/ccdf1643-a184-4ddd-9641-d426c4d51326" /> and in the client: <img width="763" height="319" alt="Screenshot 2025-07-16 at 8 28 32 AM" src="https://github.com/user-attachments/assets/6bceb811-5201-47e9-9e16-8130f0d60007" /> these were not previously being recorded nor were they being printed to the server due to the improper console sink handling --------- Signed-off-by: Charlie Doern <cdoern@redhat.com>	2025-08-06 13:37:40 -07:00
Ashwin Bharambe	7f834339ba	chore(misc): make tests and starter faster (#3042 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Test External API and Providers / test-external (venv) (push) Failing after 14s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s Details Unit Tests / unit-tests (3.13) (push) Failing after 14s Details Test Llama Stack Build / build-single-provider (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Unit Tests / unit-tests (3.12) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s Details Test Llama Stack Build / build (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s Details Python Package Build Test / build (3.13) (push) Failing after 53s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s Details Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s Details Pre-commit / pre-commit (push) Successful in 1m53s Details A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client`	2025-08-05 14:55:05 -07:00
Eran Cohen	e5b542dd8e	feat: switch to async completion in LiteLLM OpenAI mixin (#3029 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 13s Details Unit Tests / unit-tests (3.12) (push) Failing after 11s Details Python Package Build Test / build (3.13) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Python Package Build Test / build (3.12) (push) Failing after 17s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 21s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 24s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s Details Test External API and Providers / test-external (venv) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 25s Details Unit Tests / unit-tests (3.13) (push) Failing after 25s Details Pre-commit / pre-commit (push) Successful in 1m10s Details	2025-08-03 12:08:56 -07:00
Varsha	3c2aee610d	refactor: Remove double filtering based on score threshold (#3019 ) # What does this PR do? Remove score_threshold based check from `OpenAIVectorStoreMixin` Closes: https://github.com/meta-llama/llama-stack/issues/3018 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. -->	2025-08-02 15:57:03 -07:00
Matthew Farrellee	140ee7d337	fix: sambanova inference provider (#2996 ) Some checks failed Integration Tests (Replay) / discover-tests (push) Successful in 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s Details Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s Details Python Package Build Test / build (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s Details Python Package Build Test / build (3.12) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s Details Test External API and Providers / test-external (venv) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s Details Pre-commit / pre-commit (push) Successful in 1m29s Details # What does this PR do? closes #2995 update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin ## Test Plan ``` $ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct ... ======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ======================== ```	2025-08-01 09:09:14 -07:00
Francisco Arceo	33cca26154	chore: Enabling Integration tests for Weaviate (#2882 ) # What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-31 20:29:50 -04:00
Matthew Farrellee	218c89fff1	feat: Add clear error message when API key is missing (#2992 ) # What does this PR do? Improve user experience by providing specific guidance when no API key is available, showing both provider data header and config options with the correct field name for each provider. Also adds comprehensive test coverage for API key resolution scenarios. addresses #2990 for providers using litellm openai mixin ## Test Plan `./scripts/unit-tests.sh tests/unit/providers/inference/test_litellm_openai_mixin.py`	2025-07-31 16:33:16 -04:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Nathan Weinberg	cd5c6a2fcd	chore: standardize vector store not found error (#2968 ) # What does this PR do? 1. Creates a new `VectorStoreNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com>	2025-07-30 15:19:16 -07:00
Matthew Farrellee	47c078fcef	feat: implement dynamic model detection support for inference providers using litellm (#2886 ) # What does this PR do? This enhancement allows inference providers using LiteLLMOpenAIMixin to validate model availability against LiteLLM's official provider model listings, improving reliability and user experience when working with different AI service providers. - Add litellm_provider_name parameter to LiteLLMOpenAIMixin constructor - Add check_model_availability method to LiteLLMOpenAIMixin using litellm.models_by_provider - Update Gemini, Groq, and SambaNova inference adapters to pass litellm_provider_name ## Test Plan standard CI.	2025-07-28 10:13:54 -07:00
Ashwin Bharambe	9583f468f8	feat(starter)!: simplify starter distro; litellm model registry changes (#2916 )	2025-07-25 15:02:04 -07:00
Derek Higgins	52201612de	feat: implement chunk deletion for vector stores (#2701 ) Add support for deleting individual chunks from vector stores - Add abstract remove_chunk() method to EmbeddingIndex base class - Implement chunk deletion for Faiss provider, SQLite Vec, Milvus, PGVector - Placeholder implementations with NotImplementedError for Chroma/Qdrant/Weaviate - Integrate chunk deletion into OpenAI vector store file deletion flow - removed xfail from test_openai_vector_store_delete_file_removes_from_vector_store Closes: #2477 --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-07-25 10:30:30 -04:00
Calum Murray	7cc4819e90	feat: add MCP Streamable HTTP support (#2554 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds support for the new Streamable HTTP transport for MCP, as well as falling back to the SSE protocol if the Streamable HTTP connection fails. <!-- If resolving an issue, uncomment and update the line below --> Closes #2542 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> --------- Signed-off-by: Calum Murray <cmurray@redhat.com>	2025-07-24 15:04:27 -07:00
Ashwin Bharambe	1463b79218	feat(registry): make the Stack query providers for model listing (#2862 ) This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider.	2025-07-24 10:39:53 -07:00
Francisco Arceo	2aba2c1236	chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863 ) # What does this PR do? Moving vector store and vector store files helper methods to `openai_vector_store_mixin.py` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan The tests are already supported in the CI and tests the inline providers and current integration tests. Note that the `vector_index` fixture will be test `milvus_vec_adapter`, `faiss_vec_adapter`, and `sqlite_vec_adapter` in `tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`. Additionally, the integration tests in `integration-vector-io-tests.yml` runs `tests/integration/vector_io` tests for the following providers: ```python vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] ``` Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-23 13:35:48 -04:00
Matthew Farrellee	e1ed152779	chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835 ) Some checks failed Coverage Badge / unit-tests (push) Failing after 3s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s Details Integration Tests / discover-tests (push) Successful in 7s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s Details Test External Providers / test-external-providers (venv) (push) Failing after 8s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s Details Unit Tests / unit-tests (3.13) (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s Details Integration Tests / test-matrix (push) Failing after 18s Details Pre-commit / pre-commit (push) Successful in 1m14s Details # What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests	2025-07-23 06:49:40 -04:00
Ashwin Bharambe	3b83032555	feat(registry): more flexible model lookup (#2859 ) This PR updates model registration and lookup behavior to be slightly more general / flexible. See https://github.com/meta-llama/llama-stack/issues/2843 for more details. Note that this change is backwards compatible given the design of the `lookup_model()` method. ## Test Plan Added unit tests	2025-07-22 15:22:48 -07:00
Mustafa Elbehery	9736f096f6	chore(test): fix flaky telemetry tests (#2815 ) Some checks failed Installer CI / lint (push) Failing after 2s Details Installer CI / smoke-test (push) Has been skipped Details Integration Tests / discover-tests (push) Successful in 3s Details Coverage Badge / unit-tests (push) Failing after 6s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 10s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 9s Details Unit Tests / unit-tests (3.12) (push) Failing after 6s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s Details Test Llama Stack Build / generate-matrix (push) Successful in 11s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 16s Details Test Llama Stack Build / build-single-provider (push) Failing after 12s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Integration Tests / test-matrix (push) Failing after 9s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s Details Test External Providers / test-external-providers (venv) (push) Failing after 11s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 48s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 55s Details Unit Tests / unit-tests (3.13) (push) Failing after 52s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR fixes flaky telemetry tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> See https://github.com/meta-llama/llama-stack/pull/2814 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-07-22 12:30:14 -07:00
Francisco Arceo	20c3197952	chore: Making name optional in openai_create_vector_store (#2858 ) # What does this PR do? chore: Making name optional in openai_create_vector_store # Closes https://github.com/meta-llama/llama-stack/issues/2706 ## Test Plan CI and unit tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-22 13:31:31 -04:00
IAN MILLER	b57db11bed	feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745 ) Some checks failed Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s Details Update ReadTheDocs / update-readthedocs (push) Failing after 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 7s Details Integration Tests / discover-tests (push) Successful in 13s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s Details Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 12s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s Details Test External Providers / test-external-providers (venv) (push) Failing after 17s Details Test Llama Stack Build / build (push) Failing after 14s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s Details Python Package Build Test / build (3.12) (push) Failing after 51s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s Details Unit Tests / unit-tests (3.13) (push) Failing after 53s Details Pre-commit / pre-commit (push) Successful in 1m42s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models	2025-07-16 12:49:38 -04:00
Francisco Arceo	31b088978a	fix: Fix `/vector-stores/create` API when vector store with duplicate `name` (#2617 ) # What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-07-15 11:24:41 -04:00
Francisco Arceo	33f0d83ad3	chore: Move vector store `kvstore` implementation into `openai_vector_store_mixin.py` (#2748 )	2025-07-14 18:10:35 -04:00
Matthew Farrellee	f731f369a2	feat: add infrastructure to allow inference model discovery (#2710 ) # What does this PR do? inference providers each have a static list of supported / known models. some also have access to a dynamic list of currently available models. this change gives prodivers using the ModelRegistryHelper the ability to combine their static and dynamic lists. for instance, OpenAIInferenceAdapter can implement ``` def query_available_models(self) -> list[str]: return [entry.model for entry in self.openai_client.models.list()] ``` to augment its static list w/ a current list from openai. ## Test Plan scripts/unit-test.sh	2025-07-14 11:38:53 -07:00

1 2 3 4 5 ...

278 commits