llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-06-27 18:50:41 +00:00

Author	SHA1	Message	Date
Ashwin Bharambe	51e6f529f3	fix: index non-MCP toolgroups at registration time (#2272 ) Two somewhat annoying fixes: - we are going to index tools for non-MCP toolgroups always (like we used to do). because there are just random assumptions in our tests, etc. and I don't want to fix them right now - we need to handle the funny case of toolgroups like `builtin::rag/knowledge_search` where we added the tool name to use in the toolgroup itself.	2025-05-26 20:33:36 -07:00
Sébastien Han	39b33a3b01	chore: allow to pass CA cert to remote vllm (#2266 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 14s Details Integration Tests / test-matrix (http, inference) (push) Failing after 22s Details Integration Tests / test-matrix (http, datasets) (push) Failing after 28s Details Integration Tests / test-matrix (http, inspect) (push) Failing after 29s Details Integration Tests / test-matrix (http, scoring) (push) Failing after 30s Details Integration Tests / test-matrix (library, datasets) (push) Failing after 18s Details Integration Tests / test-matrix (library, agents) (push) Failing after 28s Details Integration Tests / test-matrix (library, inference) (push) Failing after 9s Details Integration Tests / test-matrix (http, post_training) (push) Failing after 35s Details Integration Tests / test-matrix (http, agents) (push) Failing after 37s Details Integration Tests / test-matrix (http, tool_runtime) (push) Failing after 34s Details Integration Tests / test-matrix (http, providers) (push) Failing after 35s Details Integration Tests / test-matrix (library, inspect) (push) Failing after 9s Details Integration Tests / test-matrix (library, providers) (push) Failing after 10s Details Integration Tests / test-matrix (library, scoring) (push) Failing after 8s Details Test External Providers / test-external-providers (venv) (push) Failing after 7s Details Integration Tests / test-matrix (library, post_training) (push) Failing after 10s Details Integration Tests / test-matrix (library, tool_runtime) (push) Failing after 11s Details Unit Tests / unit-tests (3.11) (push) Failing after 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 9s Details Unit Tests / unit-tests (3.13) (push) Failing after 8s Details Unit Tests / unit-tests (3.10) (push) Failing after 1m18s Details Pre-commit / pre-commit (push) Successful in 3m12s Details # What does this PR do? The `tls_verify` can now receive a path to a certificate file if the endpoint requires it. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-26 20:59:03 +02:00
Sébastien Han	7710b2f43b	chore: removed unused class (#2268 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-26 08:41:37 -07:00
Ashwin Bharambe	9623d5d230	fix: match mcp headers in provider data to Responses API shape (#2263 )	2025-05-25 14:33:10 -07:00
Ashwin Bharambe	ce33d02443	fix(tools): do not index tools, only index toolgroups (#2261 ) When registering a MCP endpoint, we cannot list tools (like we used to) since the MCP endpoint may be behind an auth wall. Registration can happen much sooner (via run.yaml). Instead, we do listing only when the _user_ actually calls listing. Furthermore, we cache the list in-memory in the server. Currently, the cache is not invalidated -- we may want to periodically re-list for MCP servers. Note that they must call `list_tools` before calling `invoke_tool` -- we use this critically. This will enable us to list MCP servers in run.yaml ## Test Plan Existing tests, updated tests accordingly.	2025-05-25 13:27:52 -07:00
raghotham	5a422e236c	chore: make cprint write to stderr (#2250 ) Also do sys.exit(1) in case of errors	2025-05-24 23:39:57 -07:00
raghotham	c25bd0ad58	fix: use pypi browser agent (#2260 ) Getting this error from pypi of late ``` 'python-requests/2.32.3 User-Agents are currently blocked from accessing JSON release resources. A cluster is apparently crawling all project/release resources resulting in excess cache misses. Please contact admin@pypi.org if you have information regarding what this software may be.' ```	2025-05-24 23:26:30 -07:00
Ashwin Bharambe	298721c238	chore: split routing_tables into individual files (#2259 )	2025-05-24 23:15:05 -07:00
Ashwin Bharambe	eedf21f19c	chore: split routers into individual files (inference, tool, vector_io, eval_scoring) (#2258 )	2025-05-24 22:59:07 -07:00
Ashwin Bharambe	ae7272d8ff	chore: split routers into individual files (datasets) (#2249 )	2025-05-24 22:11:43 -07:00
Ashwin Bharambe	a2160dc0af	chore: split routers into individual files (safety) Reviewers: bbrowning, leseb, ehhuang, terrytangyuan, raghotham, yanxi0830, hardikjshah Reviewed By: raghotham Pull Request: https://github.com/meta-llama/llama-stack/pull/2248	2025-05-24 22:00:32 -07:00
Ashwin Bharambe	c290999c63	fix(telemetry): get rid of annoying sqlite span export error (#2245 )	2025-05-24 20:24:34 -07:00
Ashwin Bharambe	3faf1e4a79	feat: enable MCP execution in Responses impl (#2240 ) ## Test Plan ``` pytest -s -v 'tests/verifications/openai_api/test_responses.py' \ --provider=stack:together --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-24 14:20:42 -07:00
Ashwin Bharambe	66f09f24ed	fix: disable test_responses_store (#2244 ) The test depends on llama's tool calling ability. In the CI, we run with a small ollama model. The fix might be to check for either message or function_call because the model is flaky and we aren't really testing that behavior?	2025-05-24 08:18:06 -07:00
raghotham	84751f3e55	fix: skip failing tests (#2243 ) as title. trying release 0.2.8	2025-05-24 07:31:08 -07:00
Yuan Tang	a411029d7e	docs: Update CHANGELOG.md (#2241 ) # What does this PR do? This PR adds release notes for recent releases. --------- Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-05-24 07:06:36 -07:00
ehhuang	15b0a67555	feat: add responses input items api (#2239 ) # What does this PR do? TSIA ## Test Plan added integration and unit tests	2025-05-24 07:05:53 -07:00
Yuan Tang	055f48b6a2	fix(security): Upgrade setuptools to v80.8.0. Fixes CVE-2025-47273 (#2242 ) # What does this PR do? This fixes a high vulnerable CVE in `setuptools`: https://github.com/advisories/GHSA-5rjg-fvgr-3xxf Signed-off-by: Yuan Tang <terrytangyuan@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>	2025-05-24 06:57:24 -07:00
ehhuang	ca65617a71	feat: start ui server in `llama stack run` (#2170 ) # What does this PR do? TSIA `--enable-ui` to enable ## Test Plan `llama stack run dev --image-type conda --enable-ui` `localhost:8322` shows UI llama stack run dev --image-type conda `localhost:8322` does not work	2025-05-23 20:00:09 -07:00
ehhuang	5844c2da68	feat: add list responses API (#2233 ) # What does this PR do? This is not part of the official OpenAI API, but we'll use this for the logs UI. In order to support more filtering options, I'm adopting the newly introduced sql store in in place of the kv store. ## Test Plan Added integration/unit tests.	2025-05-23 13:16:48 -07:00
Ashwin Bharambe	6463ee7633	feat: allow using llama-stack-library-client from verifications (#2238 ) Having to run (and re-run) a server while running verifications can be annoying while you are iterating on code. This makes it so you can use the library client -- and because it is OpenAI client compatible, it all works. ## Test Plan ``` pytest -s -v tests/verifications/openai_api/test_responses.py \ --provider=stack:together \ --model meta-llama/Llama-4-Scout-17B-16E-Instruct ```	2025-05-23 11:43:41 -07:00
Ashwin Bharambe	558d109ab7	fix: signature change to match OpenAI SDK (#2237 )	2025-05-23 10:59:30 -07:00
ehhuang	b054023800	chore: add sqlalchemy to test dependencies (#2236 ) # What does this PR do? ## Test Plan	2025-05-23 10:33:38 -07:00
Ashwin Bharambe	51945f1e57	feat: accept MCP authorization headers for MCP toolgroups (#2230 ) The most interesting MCP servers are those with an authorization wall in front of them. This PR uses the existing `provider_data` mechanism of passing provider API keys for passing MCP access tokens (in fact, arbitrary headers in the style of the OpenAI Responses API) from the client through to the MCP server. ``` class MCPProviderDataValidator(BaseModel): # mcp_endpoint => list of headers to send mcp_headers: dict[str, list[str]] \| None = None ``` Note how we must stuff the headers for all MCP endpoints into a single "MCPProviderDataValidator". Unlike existing providers (e.g., Together and Fireworks for inference) where we could name the provider api keys clearly (`together_api_key`, `fireworks_api_key`), we cannot name these keys for MCP. We have a single generic MCP provider which can serve multiple "toolgroups". So we use a dict to combine all the headers for all MCP endpoints you may want to use in an agentic call. ## Test Plan See the added integration test for usage.	2025-05-23 08:52:18 -07:00
ehhuang	2708312168	feat(ui): implement chat completion views (#2201 ) # What does this PR do? Implements table and detail views for chat completions <img width="1548" alt="image" src="https://github.com/user-attachments/assets/01061b7f-0d47-4b3b-b5ac-2df8f9035ef6" /> <img width="1549" alt="image" src="https://github.com/user-attachments/assets/738d8612-8258-4c2c-858b-bee39030649f" /> ## Test Plan npm run test	2025-05-22 22:05:54 -07:00
Ashwin Bharambe	d8c6ab9bfc	feat: add MCP tool signature to Responses API (#2232 )	2025-05-22 16:43:08 -07:00
ehhuang	8feb1827c8	fix: openai provider model id (#2229 ) # What does this PR do? Since https://github.com/meta-llama/llama-stack/pull/2193 switched to openai sdk, we need to strip 'openai/' from the model_id ## Test Plan start server with openai provider and send a chat completion call	2025-05-22 14:51:01 -07:00
ehhuang	549812f51e	feat: implement get chat completions APIs (#2200 ) # What does this PR do? * Provide sqlite implementation of the APIs introduced in https://github.com/meta-llama/llama-stack/pull/2145. * Introduced a SqlStore API: llama_stack/providers/utils/sqlstore/api.py and the first Sqlite implementation * Pagination support will be added in a future PR. ## Test Plan Unit test on sql store: <img width="1005" alt="image" src="https://github.com/user-attachments/assets/9b8b7ec8-632b-4667-8127-5583426b2e29" /> Integration test: ``` INFERENCE_MODEL="llama3.2:3b-instruct-fp16" llama stack build --template ollama --image-type conda --run ``` ``` LLAMA_STACK_CONFIG=http://localhost:5001 INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m pytest -v tests/integration/inference/test_openai_completion.py --text-model "llama3.2:3b-instruct-fp16" -k 'inference_store and openai' ```	2025-05-21 22:21:52 -07:00
Jorge Piedrahita Ortiz	633bb9c5b3	feat(providers): sambanova safety provider (#2221 ) # What does this PR do? Includes SambaNova safety adaptor to use the sambanova cloud served Meta-Llama-Guard-3-8B minor updates in sambanova docs ## Test Plan pytest -s -v tests/integration/safety/test_safety.py --stack-config=sambanova --safety-shield=sambanova/Meta-Llama-Guard-3-8B	2025-05-21 15:33:02 -07:00
Sébastien Han	02e5e8a633	fix: only print routes that match the runtime config (#2226 ) # What does this PR do? We now only print the 'active' routes, not all the possible routes. This is based on the distribution server config by looking at enabled APIs and their respective providers. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 15:30:29 -07:00
Sébastien Han	37f1e8a7f7	fix: use proper service account for kube auth (#2227 ) # What does this PR do? Not sure why it passed CI earlier... Strange only 24 workflows run here https://github.com/meta-llama/llama-stack/pull/2216 so the test never ran... Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 15:28:21 -07:00
Varsha	e92301f2d7	feat(sqlite-vec): enable keyword search for sqlite-vec (#1439 ) # What does this PR do? This PR introduces support for keyword based FTS5 search with BM25 relevance scoring. It makes changes to the existing EmbeddingIndex base class in order to support a search_mode and query_str parameter, that can be used for keyword based search implementations. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan run ``` pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` Output: ``` pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto /Users/vnarsing/miniconda3/envs/stack-client/lib/python3.10/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset. The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session" warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET)) ====================================================== test session starts ======================================================= platform darwin -- Python 3.10.16, pytest-8.3.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.16', 'Platform': 'macOS-14.7.4-arm64-arm-64bit', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0 asyncio: mode=auto, asyncio_default_fixture_loop_scope=None collected 7 items llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_vector PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks_fts PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED ``` For reference, with the implementation, the fts table looks like below: ``` Chunk ID: 9fbc39ce-c729-64a2-260f-c5ec9bb2a33e, Content: Sentence 0 from document 0 Chunk ID: 94062914-3e23-44cf-1e50-9e25821ba882, Content: Sentence 1 from document 0 Chunk ID: e6cfd559-4641-33ba-6ce1-7038226495eb, Content: Sentence 2 from document 0 Chunk ID: 1383af9b-f1f0-f417-4de5-65fe9456cc20, Content: Sentence 3 from document 0 Chunk ID: 2db19b1a-de14-353b-f4e1-085e8463361c, Content: Sentence 4 from document 0 Chunk ID: 9faf986a-f028-7714-068a-1c795e8f2598, Content: Sentence 5 from document 0 Chunk ID: ef593ead-5a4a-392f-7ad8-471a50f033e8, Content: Sentence 6 from document 0 Chunk ID: e161950f-021f-7300-4d05-3166738b94cf, Content: Sentence 7 from document 0 Chunk ID: 90610fc4-67c1-e740-f043-709c5978867a, Content: Sentence 8 from document 0 Chunk ID: 97712879-6fff-98ad-0558-e9f42e6b81d3, Content: Sentence 9 from document 0 Chunk ID: aea70411-51df-61ba-d2f0-cb2b5972c210, Content: Sentence 0 from document 1 Chunk ID: b678a463-7b84-92b8-abb2-27e9a1977e3c, Content: Sentence 1 from document 1 Chunk ID: 27bd63da-909c-1606-a109-75bdb9479882, Content: Sentence 2 from document 1 Chunk ID: a2ad49ad-f9be-5372-e0c7-7b0221d0b53e, Content: Sentence 3 from document 1 Chunk ID: cac53bcd-1965-082a-c0f4-ceee7323fc70, Content: Sentence 4 from document 1 ``` Query results: Result 1: Sentence 5 from document 0 Result 2: Sentence 5 from document 1 Result 3: Sentence 5 from document 2 [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>	2025-05-21 15:24:24 -04:00
Sébastien Han	85b5f3172b	docs: misc cleanup (#2223 ) # What does this PR do? * remove requirements.txt to use pyproject.toml as the source of truth * update relevant docs Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:35:27 +02:00
Sébastien Han	6a62e783b9	chore: refactor workflow writting (#2225 ) # What does this PR do? Use a composite action to avoid similar steps repetitions and centralization of the defaults. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:31:14 +02:00
Sébastien Han	1862de4be5	chore: clarify cache_ttl to be key_recheck_period (#2220 ) # What does this PR do? The cache_ttl config value is not in fact tied to the lifetime of any of the keys, it represents the time interval between for our key cache refresher. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 17:30:23 +02:00
Sébastien Han	c25acedbcd	chore: remove k8s auth in favor of k8s jwks endpoint (#2216 ) # What does this PR do? Kubernetes since 1.20 exposes a JWKS endpoint that we can use with our recent oauth2 recent implementation. The CI test has been kept intact for validation. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-21 16:23:54 +02:00
liangwen12year	2890243107	feat(quota): add server‑side per‑client request quotas (requires auth) (#2096 ) # What does this PR do? feat(quota): add server‑side per‑client request quotas (requires auth) Unrestricted usage can lead to runaway costs and fragmented client-side workarounds. This commit introduces a native quota mechanism to the server, giving operators a unified, centrally managed throttle for per-client requests—without needing extra proxies or custom client logic. This helps contain cloud-compute expenses, enables fine-grained usage control, and simplifies deployment and monitoring of Llama Stack services. Quotas are fully opt-in and have no effect unless explicitly configured. Notice that Quotas are fully opt-in and require authentication to be enabled. The 'sqlite' is the only supported quota `type` at this time, any other `type` will be rejected. And the only supported `period` is 'day'. Highlights: - Adds `QuotaMiddleware` to enforce per-client request quotas: - Uses `Authorization: Bearer <client_id>` (from AuthenticationMiddleware) - Tracks usage via a SQLite-based KV store - Returns 429 when the quota is exceeded - Extends `ServerConfig` with a `quota` section (type + config) - Enforces strict coupling: quotas require authentication or the server will fail to start Behavior changes: - Quotas are disabled by default unless explicitly configured - SQLite defaults to `./quotas.db` if no DB path is set - The server requires authentication when quotas are enabled To enable per-client request quotas in `run.yaml`, add: ``` server: port: 8321 auth: provider_type: "custom" config: endpoint: "https://auth.example.com/validate" quota: type: sqlite config: db_path: ./quotas.db limit: max_requests: 1000 period: day [//]: # (If resolving an issue, uncomment and update the line below) Closes #2093 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Wen Liang <wenliang@redhat.com> Co-authored-by: Wen Liang <wenliang@redhat.com>	2025-05-21 10:58:45 +02:00
Abhishek koserwal	5a3d777b20	feat: add llama stack rm command (#2127 ) # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] ``` llama stack rm llamastack-test ``` [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) #225 ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation)	2025-05-21 10:25:51 +02:00
grs	091d8c48f2	feat: add additional auth provider that uses oauth token introspection (#2187 ) # What does this PR do? This adds an alternative option to the oauth_token auth provider that can be used with existing authorization services which support token introspection as defined in RFC 7662. This could be useful where token revocation needs to be handled or where opaque tokens (or other non jwt formatted tokens) are used ## Test Plan Tested against keycloak Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-20 19:45:11 -07:00
grs	87a4b9cb28	fix: synchronize concurrent coroutines checking & updating key set (#2215 ) # What does this PR do? This PR adds a lock to coordinate concurrent coroutines passing through the jwt verification. As _refresh_jwks() was setting _jwks to an empty dict then repopulating it, having multiple coroutines doing this concurrently risks losing keys. The PR also builds the updated dict as a separate object and assigns it to _jwks once completed. This avoids impacting any coroutines using the key set as it is being updated. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-05-20 10:00:44 -07:00
Derek Higgins	3339844fda	feat: Add "instructions" support to responses API (#2205 ) # What does this PR do? Add support for "instructions" to the responses API. Instructions provide a way to swap out system (or developer) messages in new responses. ## Test Plan unit tests added Signed-off-by: Derek Higgins <derekh@redhat.com>	2025-05-20 09:52:10 -07:00
Jash Gulabrai	1a770cf8ac	fix: Pass model parameter as config name to NeMo Customizer (#2218 ) # What does this PR do? When launching a fine-tuning job, an upcoming version of NeMo Customizer will expect the `config` name to be formatted as `namespace/name@version`. Here, `config` is a reference to a model + additional metadata. There could be multiple `config`s that reference the same base model. This PR updates NVIDIA's `supervised_fine_tune` to simply pass the `model` param as-is to NeMo Customizer. Currently, it expects a specific, allowlisted llama model (i.e. `meta/Llama3.1-8B-Instruct`) and converts it to the provider format (`meta/llama-3.1-8b-instruct`). [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan From a notebook, I built an image with my changes: ``` !llama stack build --template nvidia --image-type venv from llama_stack.distribution.library_client import LlamaStackAsLibraryClient client = LlamaStackAsLibraryClient("nvidia") client.initialize() ``` And could successfully launch a job: ``` response = client.post_training.supervised_fine_tune( job_uuid="", model="meta/llama-3.2-1b-instruct@v1.0.0+A100", # Model passed as-is to Customimzer ... ) job_id = response.job_uuid print(f"Created job with ID: {job_id}") Output: Created job with ID: cust-Jm4oGmbwcvoufaLU4XkrRU ``` [//]: # (## Documentation) --------- Co-authored-by: Jash Gulabrai <jgulabrai@nvidia.com>	2025-05-20 09:51:39 -07:00
Sébastien Han	2eae8568e1	chore: collapse all local hook under the same repo (#2217 ) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-20 09:51:09 -07:00
Sébastien Han	3f6368d56c	ci: enable ruff output format for github (#2214 ) # What does this PR do? Update output format to enable automatic inline annotations. ![Screenshot 2025-05-20 at 10 55 38](https://github.com/user-attachments/assets/f943aa00-9b60-4cdb-b434-67b2de8b79f2) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-20 09:04:03 -07:00
Francisco Arceo	90d7612f5f	chore: Updated readme (#2219 ) # What does this PR do? chore: Updated readme [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-05-20 17:06:20 +02:00
Francisco Arceo	ed7b4731aa	fix: Setting default value for `metadata_token_count` in case the key is not found (#2199 ) # What does this PR do? If a user has previously serialized data into their vector store without the `metadata_token_count` in the chunk, the `query` method will fail in a server error. This fixes that edge case by returning 0 when the key is not detected. This solution is suboptimal but I think it's better to understate the token size rather than recalculate it and add unnecessary complexity to the retrieval code. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-05-20 08:03:22 -04:00
Ben Browning	6d20b720b8	feat: Propagate W3C trace context headers from clients (#2153 ) # What does this PR do? This extracts the W3C trace context headers (traceparent and tracestate) from incoming requests, stuffs them as attributes on the spans we create, and uses them within the tracing provider implementation to actually wrap our spans in the proper context. What this means in practice is that when a client (such as an OpenAI client) is instrumented to create these traces, we'll continue that distributed trace within Llama Stack as opposed to creating our own root span that breaks the distributed trace between client and server. It's slightly awkward to do this in Llama Stack because our Tracing API knows nothing about opentelemetry, W3C trace headers, etc - that's only knowledge the specific provider implementation has. So, that's why the trace headers get extracted by in the server code but not actually used until the provider implementation to form the proper context. This also centralizes how we were adding the `__root__` and `__root_span__` attributes, as those two were being added in different parts of the code instead of from a single place. Closes #2097 ## Test Plan This was tested manually using the helpful scripts from #2097. I verified that Llama Stack properly joined the client's span when the client was instrumented for distributed tracing, and that Llama Stack properly started its own root span when the incoming request was not part of an existing trace. Here's an example of the joined spans: ![Screenshot 2025-05-13 at 8 46 09 AM](https://github.com/user-attachments/assets/dbefda28-9faa-4339-a08d-1441efefc149) Signed-off-by: Ben Browning <bbrownin@redhat.com>	2025-05-19 18:56:54 -07:00
Sébastien Han	82778ecbb0	fix: remove wrong deprecated warning (#2202 ) # What does this PR do? `--yaml-config` is gone now with https://github.com/meta-llama/llama-stack/pull/2196. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-05-19 13:02:23 -07:00
Michael Anstis	0cc0731189	fix: Pass external_config_dir to BuildConfig (#2190 ) # What does this PR do? The `external_config_dir` configuration parameter is not being passed to the `BuildConfig` for `LlamaStackAsLibraryClient`. This prevents _plugin_ providers from being loaded when `llama-stack` is uses as a library. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan I ran `LlamaStackAsLibraryClient` with a configuration file that contained `external_config_dir` and related configuration. It does not work without this change: _external_ providers are not resolved. It does work with this change 👍 [//]: # (## Documentation)	2025-05-19 14:01:28 +02:00
ehhuang	047303e339	feat: introduce APIs for retrieving chat completion requests (#2145 ) # What does this PR do? This PR introduces APIs to retrieve past chat completion requests, which will be used in the LS UI. Our current `Telemetry` is ill-suited for this purpose as it's untyped so we'd need to filter by obscure attribute names, making it brittle. Since these APIs are 'provided by stack' and don't need to be implemented by inference providers, we introduce a new InferenceProvider class, containing the existing inference protocol, which is implemented by inference providers. The APIs are OpenAI-compliant, with an additional `input_messages` field. ## Test Plan This PR just adds the API and marks them provided_by_stack. S tart stack server -> doesn't crash	2025-05-18 21:43:19 -07:00

1 2 3 4 5 ...

2084 commits