mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-08-16 14:38:00 +00:00
2486 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
e6d5bf5588 |
add notes about batches development status to docs
this also captures other notes from agents, eval and inference apis |
||
|
11249b029b |
feat: add batches API with OpenAI compatibility
Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` |
||
|
0e8bb94bf3
|
feat(ci): make recording workflow simpler, more parameterizable (#3169)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Python Package Build Test / build (3.12) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (push) Failing after 28s
Unit Tests / unit-tests (3.12) (push) Failing after 27s
Unit Tests / unit-tests (3.13) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 2m6s
# What does this PR do? Recording tests has become a nightmare. This is the first part of making that process simpler by making it _less_ automatic. I tried to be too clever earlier. It simplifies the record-integration-tests workflow to use workflow dispatch inputs instead of PR labels. No more opaque stuff. Just go to the GitHub UI and run the workflow with inputs. I will soon add a helper script for this also. Other things to aid re-running just the small set of things you need to re-record: - Replaces the `test-types` JSON array parameter with a more intuitive `test-subdirs` comma-separated list. The whole JSON array crap was for matrix. - Adds a new `test-pattern` parameter to allow filtering tests using pytest's `-k` option ## Test Plan Note that this PR is in a fork not the source repository. - Replay tests on this PR are green - Manually [ran]( |
||
|
a6e2c18909
|
Revert "refactor(agents): migrate to OpenAI chat completions API" (#3167)
Reverts llamastack/llama-stack#3097 It has broken agents tests. |
||
|
2c06b24c77
|
test: benchmark scripts (#3160)
# What does this PR do? 1. Add our own benchmark script instead of locust (doesn't support measuring streaming latency well) 2. Simplify k8s deployment 3. Add a simple profile script for locally running server ## Test Plan ❮ ./run-benchmark.sh --target stack --duration 180 --concurrent 10 ============================================================ BENCHMARK RESULTS ============================================================ Total time: 180.00s Concurrent users: 10 Total requests: 1636 Successful requests: 1636 Failed requests: 0 Success rate: 100.0% Requests per second: 9.09 Response Time Statistics: Mean: 1.095s Median: 1.721s Min: 0.136s Max: 3.218s Std Dev: 0.762s Percentiles: P50: 1.721s P90: 1.751s P95: 1.756s P99: 1.796s Time to First Token (TTFT) Statistics: Mean: 0.037s Median: 0.037s Min: 0.023s Max: 0.211s Std Dev: 0.011s TTFT Percentiles: P50: 0.037s P90: 0.040s P95: 0.044s P99: 0.055s Streaming Statistics: Mean chunks per response: 64.0 Total chunks received: 104775 |
||
|
2114214fe3
|
chore(python-deps): bump huggingface-hub from 0.34.3 to 0.34.4 (#3084)
Bumps [huggingface-hub](https://github.com/huggingface/huggingface_hub) from 0.34.3 to 0.34.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/huggingface_hub/releases">huggingface-hub's releases</a>.</em></p> <blockquote> <h2>[v0.34.4] Support Image to Video inference + QoL in jobs API, auth and utilities</h2> <p>Biggest update is the support of Image-To-Video task with inference provider Fal AI</p> <ul> <li>[Inference] Support image to video task <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3289">#3289</a> by <a href="https://github.com/hanouticelina"><code>@hanouticelina</code></a></li> </ul> <pre lang="py"><code>>>> from huggingface_hub import InferenceClient >>> client = InferenceClient() >>> video = client.image_to_video("cat.jpg", model="Wan-AI/Wan2.2-I2V-A14B", prompt="turn the cat into a tiger") >>> with open("tiger.mp4", "wb") as f: ... f.write(video) </code></pre> <p>And some quality of life improvements:</p> <ul> <li>Add type to job owner <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3291">#3291</a> by <a href="https://github.com/drbh"><code>@drbh</code></a></li> <li>Include HF_HUB_DISABLE_XET in the environment dump <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3290">#3290</a> by <a href="https://github.com/hanouticelina"><code>@hanouticelina</code></a></li> <li>Whoami: custom message only on unauthorized <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3288">#3288</a> by <a href="https://github.com/Wauplin"><code>@Wauplin</code></a></li> <li>Add validation warnings for repository limits in upload_large_folder <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3280">#3280</a> by <a href="https://github.com/davanstrien"><code>@davanstrien</code></a></li> <li>Add timeout info to Jobs guide docs <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3281">#3281</a> by <a href="https://github.com/davanstrien"><code>@davanstrien</code></a></li> <li>[Jobs] Use current or stored token in a Job secrets <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3272">#3272</a> by <a href="https://github.com/lhoestq"><code>@lhoestq</code></a></li> <li>Fix bash history expansion in hf jobs example <a href="https://redirect.github.com/huggingface/huggingface_hub/issues/3277">#3277</a> by <a href="https://github.com/nyuuzyou"><code>@nyuuzyou</code></a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4">https://github.com/huggingface/huggingface_hub/compare/v0.34.3...v0.34.4</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
a275282685
|
chore(python-deps): bump pymilvus from 2.5.14 to 2.6.0 (#3086)
Bumps [pymilvus](https://github.com/milvus-io/pymilvus) from 2.5.14 to 2.6.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/milvus-io/pymilvus/releases">pymilvus's releases</a>.</em></p> <blockquote> <h2>PyMilvus v2.6.0 Release Notes</h2> <h2>New Features</h2> <ol> <li>Add APIs in MilvusClient</li> </ol> <ul> <li>enhance: add describe and alter database in MilvusClient by <a href="https://github.com/smellthemoon"><code>@smellthemoon</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2433">milvus-io/pymilvus#2433</a></li> <li>enhance: support milvus-client iterator by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2461">milvus-io/pymilvus#2461</a></li> <li>enhance: Enable resource group api in milvus client by <a href="https://github.com/weiliu1031"><code>@weiliu1031</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2513">milvus-io/pymilvus#2513</a></li> <li>enhance: add release_collection, drop_index, create_partition, drop_partition, load_partition and release_partition by <a href="https://github.com/brcarry"><code>@brcarry</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2525">milvus-io/pymilvus#2525</a></li> <li>enhance: enable describe_replica api in milvus client by <a href="https://github.com/weiliu1031"><code>@weiliu1031</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2541">milvus-io/pymilvus#2541</a></li> <li>enhance: support recalls for milvus_client by <a href="https://github.com/chasingegg"><code>@chasingegg</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2552">milvus-io/pymilvus#2552</a></li> <li>enhance: add use_database by <a href="https://github.com/czs007"><code>@czs007</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2491">milvus-io/pymilvus#2491</a></li> </ul> <ol start="2"> <li>Add AsyncMilvusClient</li> </ol> <ul> <li>[FEAT] Asyncio support by <a href="https://github.com/brcarry"><code>@brcarry</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2411">milvus-io/pymilvus#2411</a></li> <li>Add async DDL funcs & DDL examples by <a href="https://github.com/Shawnzheng011019"><code>@Shawnzheng011019</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2852">milvus-io/pymilvus#2852</a></li> </ul> <ol start="3"> <li>Other features</li> </ol> <ul> <li>enhance: support Int8Vector by <a href="https://github.com/cydrain"><code>@cydrain</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2611">milvus-io/pymilvus#2611</a></li> <li>feat: support recalls field in SearchResult by <a href="https://github.com/chasingegg"><code>@chasingegg</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2390">milvus-io/pymilvus#2390</a></li> <li>enhance: Support Python3.13 and upgrade grpcio range by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2684">milvus-io/pymilvus#2684</a></li> <li>enhance: support run analyzer return detail token by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2679">milvus-io/pymilvus#2679</a></li> <li>enhance: Add force_drop parameter to drop_role method for role deletion by <a href="https://github.com/SimFG"><code>@SimFG</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2705">milvus-io/pymilvus#2705</a></li> <li>enhance: add property func for AnalyzeToken by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2704">milvus-io/pymilvus#2704</a></li> <li>enhance: grant/revoke v2 optional db and collection params by <a href="https://github.com/shaoting-huang"><code>@shaoting-huang</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2386">milvus-io/pymilvus#2386</a></li> <li>extend unlimted offset for query iterator(<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2418">#2418</a>) by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2419">milvus-io/pymilvus#2419</a></li> <li>enhance: alterindex & altercollection supports altering properties by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2406">milvus-io/pymilvus#2406</a></li> <li>enhance: alterdatabase support delete property by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2435">milvus-io/pymilvus#2435</a></li> <li>enhance: support hints param by <a href="https://github.com/chasingegg"><code>@chasingegg</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2408">milvus-io/pymilvus#2408</a></li> <li>enhance: create database support properties by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2448">milvus-io/pymilvus#2448</a></li> <li>enhance: Add <code>db_name</code> parameter at <code>bulk_import</code> by <a href="https://github.com/counter2015"><code>@counter2015</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2446">milvus-io/pymilvus#2446</a></li> <li>enhance: add search iterator v2 by <a href="https://github.com/PwzXxm"><code>@PwzXxm</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2395">milvus-io/pymilvus#2395</a></li> <li>enhance: simplify the structure of search_params by <a href="https://github.com/smellthemoon"><code>@smellthemoon</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2507">milvus-io/pymilvus#2507</a></li> <li>enhance: Remove long deprecated Milvus class by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2544">milvus-io/pymilvus#2544</a></li> <li>enhance: Use new model pkg by <a href="https://github.com/junjiejiangjjj"><code>@junjiejiangjjj</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2595">milvus-io/pymilvus#2595</a></li> <li>enhance: Add schema update time verification to insert and upsert to use cache by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2551">milvus-io/pymilvus#2551</a></li> <li>enhance: describecollection output add created_timestamp by <a href="https://github.com/JsDove"><code>@JsDove</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2618">milvus-io/pymilvus#2618</a></li> <li>feat: add external filter func for search iterator v2 by <a href="https://github.com/PwzXxm"><code>@PwzXxm</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2639">milvus-io/pymilvus#2639</a></li> <li>enhance: support run analyzer by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2622">milvus-io/pymilvus#2622</a></li> <li>weighted reranker to allow skip score normalization by <a href="https://github.com/zhengbuqian"><code>@zhengbuqian</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2708">milvus-io/pymilvus#2708</a></li> <li>enhance: Support AddCollectionField API by <a href="https://github.com/congqixia"><code>@congqixia</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2722">milvus-io/pymilvus#2722</a></li> <li>Add 1-Way and 2-Way TLS Support to Bulk Import Functions by <a href="https://github.com/abd-770"><code>@abd-770</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2672">milvus-io/pymilvus#2672</a></li> <li>enhance: Use SearchResult in MilvusClient by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2735">milvus-io/pymilvus#2735</a></li> <li>Support rerank by <a href="https://github.com/junjiejiangjjj"><code>@junjiejiangjjj</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2729">milvus-io/pymilvus#2729</a></li> <li>feat: suppoprt multi analyzer params by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2747">milvus-io/pymilvus#2747</a></li> <li>Add funciton checker by <a href="https://github.com/junjiejiangjjj"><code>@junjiejiangjjj</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2760">milvus-io/pymilvus#2760</a></li> <li>enhance: Support run analyzer by collection and field by <a href="https://github.com/aoiasd"><code>@aoiasd</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2822">milvus-io/pymilvus#2822</a></li> <li>feat: support load collection/partition with priority(<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2835">#2835</a>) by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2836">milvus-io/pymilvus#2836</a></li> <li>enhance: optimize perf for large topk(<a href="https://redirect.github.com/milvus-io/pymilvus/issues/2848">#2848</a>) by <a href="https://github.com/MrPresent-Han"><code>@MrPresent-Han</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2849">milvus-io/pymilvus#2849</a></li> <li>enhance: Add usage guide to manage MilvusClient by <a href="https://github.com/XuanYang-cn"><code>@XuanYang-cn</code></a> in <a href="https://redirect.github.com/milvus-io/pymilvus/pull/2907">milvus-io/pymilvus#2907</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
e743d3fdf6
|
refactor(agents): migrate to OpenAI chat completions API (#3097)
Replace chat_completion calls with openai_chat_completion to eliminate dependency on legacy inference APIs. # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> Closes #3067 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |
||
|
f66ae3b3b1
|
docs(tests): Add a bunch of documentation for our testing systems (#3139)
# What does this PR do? Creates a structured testing documentation section with multiple detailed pages: - Testing overview explaining the record-replay architecture - Integration testing guide with practical usage examples - Record-replay system technical documentation - Guide for writing effective tests - Troubleshooting guide for common testing issues Hopefully this makes things a bit easier. |
||
|
81ecaf6221
|
fix(ci): make the Vector IO CI follow the same pattern as others (#3164)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 8s
Python Package Build Test / build (3.12) (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m19s
# What does this PR do? Updates the integration-vector-io-tests workflow to run daily tests on Python 3.13 while limiting regular PR tests to Python 3.12 only. The PR also improves the concurrency configuration to prevent workflow conflicts between main branch runs and PR runs. ## Test Plan [](https://app.graphite.dev/settings/meme-library?org=llamastack) |
||
|
01b2afd4b5
|
fix(tests): record missing tests for test_responses_store (#3163)
# What does this PR do? Updates test recordings. ## Test Plan Started ollama serving the 3.2:3b model. Then ran the server: ``` LLAMA_STACK_TEST_INFERENCE_MODE=record \ LLAMA_STACK_TEST_RECORDING_DIR=tests/integration/recordings/ \ SQLITE_STORE_DIR=$(mktemp -d) \ OLLAMA_URL=http://localhost:11434 \ llama stack build --template starter --image-type venv --run ``` Then ran the tests which needed recording: ``` pytest -sv tests/integration/agents/test_openai_responses.py \ --stack-config=server:starter \ --text-model ollama/llama3.2:3b-instruct-fp16 -k test_responses_store ``` Then, restarted the server with `LLAMA_STACK_TEST_INFERENCE_MODE=replay`, re-ran the tests and verified they passed. |
||
|
8ed69978f9
|
refactor(tests): make the responses tests nicer (#3161)
# What does this PR do? A _bunch_ on cleanup for the Responses tests. - Got rid of YAML test cases, moved them to just use simple pydantic models - Splitting the large monolithic test file into multiple focused test files: - `test_basic_responses.py` for basic and image response tests - `test_tool_responses.py` for tool-related tests - `test_file_search.py` for file search specific tests - Adding a `StreamingValidator` helper class to standardize streaming response validation ## Test Plan Run the tests: ``` pytest -s -v tests/integration/non_ci/responses/ \ --stack-config=starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` |
||
|
ba664474de
|
feat(responses): add mcp list tool streaming event (#3159)
# What does this PR do? Adds proper streaming events for MCP tool listing (`mcp_list_tools.in_progress` and `mcp_list_tools.completed`). Also refactors things a bit more. ## Test Plan Verified existing integration tests pass with the refactored code. The test `test_response_streaming_multi_turn_tool_execution` has been updated to check for the new MCP list tools streaming events |
||
|
9324e902f1
|
refactor(responses): move stuff into some utils and add unit tests (#3158)
# What does this PR do? Refactors the OpenAI response conversion utilities by moving helper functions from `openai_responses.py` to `utils.py`. Adds unit tests. |
||
|
47d5af703c
|
chore(responses): Refactor Responses Impl to be civilized (#3138)
# What does this PR do? Refactors the OpenAI responses implementation by extracting streaming and tool execution logic into separate modules. This improves code organization by: 1. Creating a new `StreamingResponseOrchestrator` class in `streaming.py` to handle the streaming response generation logic 2. Moving tool execution functionality to a dedicated `ToolExecutor` class in `tool_executor.py` ## Test Plan Existing tests |
||
|
e69acbafbf
|
feat(UI): Adding linter and prettier for UI (#3156) | ||
|
61582f327c
|
fix(ci): update triggers for the workflows (#3152)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Update ReadTheDocs / update-readthedocs (push) Failing after 13s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 21s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 1m39s
|
||
|
c15cc7ed77
|
fix: use ChatCompletionMessageFunctionToolCall (#3142)
The OpenAI compatibility layer was incorrectly importing ChatCompletionMessageToolCallParam instead of the ChatCompletionMessageFunctionToolCall class. This caused "Cannot instantiate typing.Union" errors when processing agent requests with tool calls. Closes: #3141 Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
ee7631b6cf
|
Revert "feat: add batches API with OpenAI compatibility" (#3149)
Reverts llamastack/llama-stack#3088 The PR broke integration tests. |
||
|
de692162af
|
feat: add batches API with OpenAI compatibility (#3088)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Update ReadTheDocs / update-readthedocs (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 1m53s
Add complete batches API implementation with protocol, providers, and tests: Core Infrastructure: - Add batches API protocol using OpenAI Batch types directly - Add Api.batches enum value and protocol mapping in resolver - Add OpenAI "batch" file purpose support - Include proper error handling (ConflictError, ResourceNotFoundError) Reference Provider: - Add ReferenceBatchesImpl with full CRUD operations (create, retrieve, cancel, list) - Implement background batch processing with configurable concurrency - Add SQLite KVStore backend for persistence - Support /v1/chat/completions endpoint with request validation Comprehensive Test Suite: - Add unit tests for provider implementation with validation - Add integration tests for end-to-end batch processing workflows - Add error handling tests for validation, malformed inputs, and edge cases Configuration: - Add max_concurrent_batches and max_concurrent_requests_per_batch options - Add provider documentation with sample configurations Test with - ``` $ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run & $ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK ``` addresses #3066 |
||
|
46ff302d87
|
chore: Remove Trendshift badge from README (#3137)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Python Package Build Test / build (3.12) (push) Failing after 11s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 49s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 51s
Unit Tests / unit-tests (3.12) (push) Failing after 51s
Pre-commit / pre-commit (push) Successful in 1m36s
## Summary - This links to a scammy looking website with ads. ## Test plan |
||
|
e1e161553c
|
feat(responses): add MCP argument streaming and content part events (#3136)
# What does this PR do? Adds content part streaming events to the OpenAI-compatible Responses API to support more granular streaming of response content. This introduces: 1. New schema types for content parts: `OpenAIResponseContentPart` with variants for text output and refusals 2. New streaming event types: - `OpenAIResponseObjectStreamResponseContentPartAdded` for when content parts begin - `OpenAIResponseObjectStreamResponseContentPartDone` for when content parts complete 3. Implementation in the reference provider to emit these events during streaming responses. Also emits MCP arguments just like function call ones. ## Test Plan Updated existing streaming tests to verify content part events are properly emitted |
||
|
8638537d14
|
feat(responses): stream progress of tool calls (#3135)
# What does this PR do? Enhances tool execution streaming by adding support for real-time progress events during tool calls. This implementation adds streaming events for MCP and web search tools, including in-progress, searching, completed, and failed states. The refactored `_execute_tool_call` method now returns an async iterator that yields streaming events throughout the tool execution lifecycle. ## Test Plan Updated the integration test `test_response_streaming_multi_turn_tool_execution` to verify the presence and structure of new streaming events, including: - Checking for MCP in-progress and completed events - Verifying that progress events contain required fields (item_id, output_index, sequence_number) - Ensuring completed events have the necessary sequence_number field |
||
|
5b312a80b9
|
feat(responses): improve streaming for function calls (#3124)
Some checks failed
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Python Package Build Test / build (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 7s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 21s
Python Package Build Test / build (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Update ReadTheDocs / update-readthedocs (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 22s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m10s
Test Llama Stack Build / build (push) Failing after 12s
Emit streaming events for function calls ## Test Plan Improved the test case |
||
|
d6ae54723d
|
chore: setup for performance benchmarking (#3096)
# What does this PR do? 1. Added a simple mock openai-compat server that serves chat/completion 2. Add a benchmark server in EKS that includes mock inference server 3. Add locust (https://locust.io/) file for load testing ## Test Plan bash apply.sh kubectl port-forward service/locust-web-ui 8089:8089 Go to localhost:8089 to start a load test <img width="1392" height="334" alt="image" src="https://github.com/user-attachments/assets/d6aa3deb-583a-42ed-889b-751262b8e91c" /> <img width="1362" height="881" alt="image" src="https://github.com/user-attachments/assets/6a28b9b4-05e6-44e2-b504-07e60c12d35e" /> |
||
|
2f51273215
|
fix: huge speed boost (#3132)
# What does this PR do? make llama stack fast again ## Test Plan |
||
|
25e0553eed
|
chore: Change moderations api response to Provider returned categories (#3098)
# What does this PR do? To be compliant with model policies for LLAMA, just return the categories as is from provider, we will lose the OAI compat in moderations api response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama` |
||
|
a9081d87b9 | feat(ci): update Recording workflow trigger and concurrency group | ||
|
0950168f26
|
refactor: replace hardcoded status codes by httpx.codes (#3131)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to eliminate hardcoded status codes in server's responses and replace it by `httpx.codes` functionality for better consistency across the whole project and improvement in code readability. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Run `./scripts/unit-tests.sh` |
||
|
0cbd93c5cc
|
docs: Update blocks formatting in docs/source files (#3120)
**Description:** The standard markdown [!NOTE] format is not supported on Sphinx generated documentation, replacing those instances. Also updating other Notes, Tips and Warning blocks throughout the source docs WIP: Working to update the provider code gen |
||
|
c9b78602d3
|
refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to make the behavior DELETE API endpoints be consistent with standard RESTful conventions and eliminate confusion for API consumers. Old Behavior ``` HTTP Status: 200 OK Response Body: null ``` Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` `INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry: 15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield HTTP/1.1" 200 ` Updated Behavior ``` HTTP Status: 204 No Content Response Body: empty (no body) ``` Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry: 15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield HTTP/1.1" 204 ` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #3090 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Run `./scripts/unit-tests.sh` |
||
|
92aca434a7
|
fix: Fix list_sessions() (#3114)
# What does this PR do? 1. Updates `AgentPersistence.list_sessions()` to properly filter out `Turn` keys from `Session` keys. 2. Adds a suite of unit tests to confirm the `list_sessions()` behavior and tests the failed sample in https://github.com/meta-llama/llama-stack/issues/3048 ## Fixes https://github.com/meta-llama/llama-stack/issues/3048 ## Test Plan Unit tests added. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
5bd6cb52fb
|
fix: github action canceling valid tasks for checking semantic pr title (#3127)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR changes the group name from github.ref to github.even.pull_request_number. The reason for this is that github.ref does not act as a unique identifier in the pull_request_target event and only is unique in pull_request. The github action was getting canceled was because the group name was not unique in the concurrency section. <!-- If resolving an issue, uncomment and update the line below --> Closes #3102 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> To test this I have created a fake github action and ran it trough act to see what the github.ref variable produced and what alternatives can be used. This confirmed that the github.ref was not unique and that github.event.pull_request_number is unique to the PR. |
||
|
fffdab4f5c
|
fix: Dell distribution missing kvstore (#3113)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Tests (Replay) / discover-tests (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do? - Added kvstore config to ChromaDB provider config for Dell distribution similar to [starter config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112) - Fixed [error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425) getting endpoint information by adding `hf-inference` as the provider to the `AsyncInferenceClient` (TGI client). ## Test Plan ``` export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=8000 export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 export HF_TOKEN=[redacted] # TGI Server docker run --rm -it \ --pull always \ --network host \ -v $HOME/.cache/huggingface:/data \ -e HF_TOKEN=$HF_TOKEN \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -p $INFERENCE_PORT:$INFERENCE_PORT \ --gpus all \ ghcr.io/huggingface/text-generation-inference:latest \ --dtype float16 \ --usage-stats off \ --sharded false \ --cuda-memory-fraction 0.8 \ --model-id meta-llama/Llama-3.2-3B-Instruct \ --port $INFERENCE_PORT \ --hostname 0.0.0.0 # Chrome DB docker run --rm -it \ --name chromadb \ --net=host -p 8000:8000 \ -v ~/chroma:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ -e ANONYMIZED_TELEMETRY=FALSE \ chromadb/chroma:latest # Llama Stack llama stack run dell \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL ``` --------- Co-authored-by: Connor Hack <connorhack@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
6358d0a478
|
docs: reorganize contributor guide (#3110)
Some checks failed
Test Llama Stack Build / generate-matrix (push) Successful in 7s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Python Package Build Test / build (3.13) (push) Failing after 5s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 19s
Update ReadTheDocs / update-readthedocs (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 11s
Pre-commit / pre-commit (push) Successful in 1m48s
**Description:** Restructures contribution guide and move some sections into categories <img width="1399" height="527" alt="Screenshot 2025-08-12 at 9 28 44 AM" src="https://github.com/user-attachments/assets/404e23b4-0001-4174-b662-593e0173ef7d" /> |
||
|
3d90117891
|
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers. I also enabled a bunch of Vector IO tests to be used with `LlamaStackLibraryClient` ## Test Plan Run Responses tests with llama stack library client: ``` pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` Do the same with `-k openai_client` The rest should be taken care of by CI. |
||
|
1721aafc1f
|
feat(responses): type file results properly (#3117)
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / build-single-provider (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 30s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 28s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 26s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 17s
Pre-commit / pre-commit (push) Successful in 1m16s
Another thing our tests implicitly depended on. |
||
|
4fec49dfdb
|
feat(responses): add include parameter (#3115)
Well our Responses tests use it so we better include it in the API, no? I discovered it because I want to make sure `llama-stack-client` can be used always instead of `openai-python` as the client (we do want to be _truly_ compatible.) |
||
|
6812aa1e1e
|
chore: bump min python version in docs and tests (#3103)
# What does this PR do? the minimum python version for the project was bumped to 3.12 a couple months ago, but there remains some artifacts in the repo suggesting we support >=3.10 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
88c4fdc5d7
|
chore(python-deps): bump chromadb from 1.0.15 to 1.0.16 (#3083)
Bumps [chromadb](https://github.com/chroma-core/chroma) from 1.0.15 to 1.0.16. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/chroma-core/chroma/releases">chromadb's releases</a>.</em></p> <blockquote> <h2>1.0.16</h2> <p>Version: <code>1.0.16</code> Git ref: <code>refs/tags/1.0.16</code> Build Date: <code>2025-08-08T00:26</code> PIP Package: <code>chroma-1.0.16.tar.gz</code> Github Container Registry Image: <code>:1.0.16</code> DockerHub Image: <code>:1.0.16</code></p> <h2>What's Changed</h2> <ul> <li>[ENH]: add cache mount & tolerations to garbage collector template in Helm chart by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5016">chroma-core/chroma#5016</a></li> <li>[DOC] Fix docs typo by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5018">chroma-core/chroma#5018</a></li> <li>[CLN] Change GenericQuotaError from 429 to 422 by <a href="https://github.com/drewkim"><code>@drewkim</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5022">chroma-core/chroma#5022</a></li> <li>[CHORE] Fix type error in batch_utils by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5024">chroma-core/chroma#5024</a></li> <li>[ENH] Add block-level metrics by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/4801">chroma-core/chroma#4801</a></li> <li>[ENH]: return error on /add if embeddings are not provided by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5033">chroma-core/chroma#5033</a></li> <li>[DOC] Docs Polish 07/2025 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5032">chroma-core/chroma#5032</a></li> <li>[DOC] Flatten public txt files by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5040">chroma-core/chroma#5040</a></li> <li>[ENH]: require embeddings & require min embedding dimension on /add by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5037">chroma-core/chroma#5037</a></li> <li>[ENH] - Adds in dark mode support for hero image by <a href="https://github.com/tjkrusinskichroma"><code>@tjkrusinskichroma</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5042">chroma-core/chroma#5042</a></li> <li>[BLD] Use 8core runners for all our windows jobs by <a href="https://github.com/eculver"><code>@eculver</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5027">chroma-core/chroma#5027</a></li> <li>[TST] More benchmark queries for regex by <a href="https://github.com/Sicheng-Pan"><code>@Sicheng-Pan</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/4910">chroma-core/chroma#4910</a></li> <li>[BUG]: refactor otel/tracing initialization in the frontend to be independent of hosted entry point by <a href="https://github.com/c-gamble"><code>@c-gamble</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5028">chroma-core/chroma#5028</a></li> <li>[BUG] js client: handle 422 billing errors as QuotaExceeded instead of ChromaConnectionError by <a href="https://github.com/philipithomas"><code>@philipithomas</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5049">chroma-core/chroma#5049</a></li> <li>[BUG] RLS should use 32MB GRPC payload size limit by <a href="https://github.com/Sicheng-Pan"><code>@Sicheng-Pan</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5044">chroma-core/chroma#5044</a></li> <li>[BUG] Sync protoc arch and version in dockerfile by <a href="https://github.com/Sicheng-Pan"><code>@Sicheng-Pan</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5045">chroma-core/chroma#5045</a></li> <li>[BLD] Fix windows runner label by <a href="https://github.com/eculver"><code>@eculver</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5052">chroma-core/chroma#5052</a></li> <li>[PERF]: Prefetch segments in get and query by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5053">chroma-core/chroma#5053</a></li> <li>[PERF]: Parallelize fetching blocks for brute force regex by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5051">chroma-core/chroma#5051</a></li> <li>[RELEASE] JS 3.0.7 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5059">chroma-core/chroma#5059</a></li> <li>[ENH] Add a delete_many call to the storage API. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5020">chroma-core/chroma#5020</a></li> <li>[ENH] Consume delete_many from the wal3 garbage collector. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5021">chroma-core/chroma#5021</a></li> <li>[ENH]: limit number of concurrent get_all_block_ids() when using buffer_unordered() by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5062">chroma-core/chroma#5062</a></li> <li>[ENH]: use new <code>delete_many()</code> storage method in DeleteUnusedFiles operator by <a href="https://github.com/codetheweb"><code>@codetheweb</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5061">chroma-core/chroma#5061</a></li> <li>[BUG]: Disable aws stalled stream protection by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5063">chroma-core/chroma#5063</a></li> <li>[DOC] Update manage collections docs with correct delete collection info by <a href="https://github.com/jairad26"><code>@jairad26</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5066">chroma-core/chroma#5066</a></li> <li>[BUG] Improve wal3 robustness with better shutdown handling and error recovery by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5046">chroma-core/chroma#5046</a></li> <li>[ENH] Do not do any mutations of the manifest from within GC. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5050">chroma-core/chroma#5050</a></li> <li>[CHORE]: enable change notifier otel/tracing by <a href="https://github.com/c-gamble"><code>@c-gamble</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5073">chroma-core/chroma#5073</a></li> <li>[CHORE] Add pprof server to query service by <a href="https://github.com/eculver"><code>@eculver</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5072">chroma-core/chroma#5072</a></li> <li>[ENH]: Dedup inserts to the same key in foyer by <a href="https://github.com/sanketkedia"><code>@sanketkedia</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5074">chroma-core/chroma#5074</a></li> <li>[ENH] "Failed to fetch: status: NotFound" be gone. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5064">chroma-core/chroma#5064</a></li> <li>[CLN] Remove the the top most spammy log lines from rls/wal3. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5071">chroma-core/chroma#5071</a></li> <li>[DOC] Fix badge in readme by <a href="https://github.com/kylediaz"><code>@kylediaz</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5025">chroma-core/chroma#5025</a></li> <li>[ENH] A tool for patching logs that were deleted before a new manifest was installed. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5083">chroma-core/chroma#5083</a></li> <li>[BUG] Add billing errors to JS client by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5084">chroma-core/chroma#5084</a></li> <li>[CHORE]: Add s3 get metrics and pod name to tracing spans by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5086">chroma-core/chroma#5086</a></li> <li>[RELEASE] JS 3.0.8 by <a href="https://github.com/itaismith"><code>@itaismith</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5087">chroma-core/chroma#5087</a></li> <li>[ENH] A tool to purge the cache. by <a href="https://github.com/rescrv"><code>@rescrv</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5085">chroma-core/chroma#5085</a></li> <li>[DOC] Update PR template for migration and observability by <a href="https://github.com/HammadB"><code>@HammadB</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5089">chroma-core/chroma#5089</a></li> <li>[CHORE]: Fix s3 get metric name by <a href="https://github.com/tanujnay112"><code>@tanujnay112</code></a> in <a href="https://redirect.github.com/chroma-core/chroma/pull/5091">chroma-core/chroma#5091</a></li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
393f3714b0
|
chore(python-deps): bump torch from 2.7.1 to 2.8.0 (#3082)
Bumps [torch](https://github.com/pytorch/pytorch) from 2.7.1 to 2.8.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/pytorch/pytorch/releases">torch's releases</a>.</em></p> <blockquote> <h1>PyTorch 2.8.0 Release Notes</h1> <ul> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#highlights">Highlights</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#backwards-incompatible-changes">Backwards Incompatible Changes</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#deprecations">Deprecations</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#new-features">New Features</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#improvements">Improvements</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#bug-fixes">Bug fixes</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#performance">Performance</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#documentation">Documentation</a></li> <li><a href="https://github.com/pytorch/pytorch/blob/HEAD/#developers">Developers</a></li> </ul> <h1>Highlights</h1> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href=" |
||
|
b70e2f1f09
|
fix(dep): update to openai >= 1.99.6 and use new Function location (#3087)
# What does this PR do? closes #3072 ## Test Plan ci |
||
|
4a13ef45e9
|
fix: Implement missing run_moderation method in PromptGuardSafetyImpl (#3101)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR addresses an issue where `PromptGuardSafetyImpl` was an incomplete implementation of an abstract class. The class was missing the required run_moderation method from its parent interface. Currently, running `pre-commit` locally fails with the error below. ``` llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation" [abstract] Found 1 error in 1 file (checked 410 source files) ``` This PR fixes the issue as follows - Added the missing run_moderation method to PromptGuardSafetyImpl - Method raises NotImplementedError with appropriate message indicating this functionality is not implemented for PromptGuard - This allows the class to be properly instantiated while clearly indicating the limitation <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> |
||
|
19123ca957
|
refactor: standardize InferenceRouter model handling (#2965)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m19s
|
||
|
803114180b
|
chore(logging)!: use comma as a delimiter (#3095)
Some checks failed
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 11s
Unit Tests / unit-tests (3.13) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Integration Tests (Replay) / discover-tests (push) Successful in 31s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 30s
Python Package Build Test / build (3.13) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 32s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 33s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 40s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 42s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 1m24s
Test Llama Stack Build / build (push) Failing after 54s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 13s
Using commas is much more shell-friendly. A semi-colon is a statement delimiter and must be escaped. This change is backwards incompatible but I imagine not many people are using this. I could be wrong. Looking for feedback. |
||
|
f7adf58b1b
|
docs: Add documentation on how to contribute a Vector DB provider and update testing documentation (#3093)
# What does this PR do? - Adds documentation on how to contribute a Vector DB provider. - Updates the testing section to be a little friendlier to navigate. - Also added new shortcut for search so that `/` and `⌘ K` or `ctrl+K` trigger search <img width="1903" height="1346" alt="Screenshot 2025-08-11 at 10 10 12 AM" src="https://github.com/user-attachments/assets/6995b3b8-a2ab-4200-be72-c5b03a784a29" /> <img width="1915" height="1438" alt="Screenshot 2025-08-11 at 10 10 25 AM" src="https://github.com/user-attachments/assets/1f54d30e-5be1-4f27-b1e9-3c3537dcb8e9" /> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
b5b5f5b9ae
|
chore: add mypy prompt guard (#2678)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> |
||
|
7448a4a88c
|
chore: Updating UI Sidebar (#3081)
# What does this PR do? This updates the sidebar to look a little more like other popular ones. <img width="1913" height="1352" alt="Screenshot 2025-08-08 at 11 25 31 PM" src="https://github.com/user-attachments/assets/00738412-1101-48ec-8864-cde4a8733ec1" /> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
8faff92591
|
chore: remove redundant code in unregister_toolgroup (#3092)
# What does this PR do? removes redundant code ## Test Plan ci |
||
|
a4bad6c0b4
|
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |