llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 10:23:44 +00:00

Author	SHA1	Message	Date
Eric Huang	275cc53f28	test # What does this PR do? ## Test Plan	2025-10-09 23:35:17 -07:00
Eric Huang	689c9f0762	merge commit for archive created by Sapling	2025-10-09 23:18:10 -07:00
Eric Huang	6cc9102097	test # What does this PR do? ## Test Plan	2025-10-09 23:18:02 -07:00
Eric Huang	fa59fc9f92	merge commit for archive created by Sapling	2025-10-09 23:04:19 -07:00
Eric Huang	9f5fdce86e	test # What does this PR do? ## Test Plan	2025-10-09 23:04:13 -07:00
Eric Huang	0cd78e2ba6	merge commit for archive created by Sapling	2025-10-09 22:46:38 -07:00
Eric Huang	2c3e1a96b6	test # What does this PR do? ## Test Plan	2025-10-09 22:42:13 -07:00
Eric Huang	237ba78995	merge commit for archive created by Sapling	2025-10-09 20:53:29 -07:00
Eric Huang	4a3d1e33f8	test # What does this PR do? ## Test Plan	2025-10-09 20:53:21 -07:00
Ashwin Bharambe	ebae0385bb	fix: update dangling references to llama download command (#3763 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Test Llama Stack Build / build-single-provider (push) Failing after 3s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 2m14s Details ## Summary After removing model management CLI in #3700, this PR updates remaining references to the old `llama download` command to use `huggingface-cli download` instead. ## Changes - Updated error messages in `meta_reference/common.py` to recommend `huggingface-cli download` - Updated error messages in `torchtune/recipes/lora_finetuning_single_device.py` to use `huggingface-cli download` - Updated post-training notebook to use `huggingface-cli download` instead of `llama download` - Fixed typo: "you model" -> "your model" ## Test Plan - Verified error messages provide correct guidance for users - Checked that notebook instructions are up-to-date with current tooling	2025-10-09 18:35:02 -07:00
Ashwin Bharambe	8fe4a216b5	fix(inference): propagate 401/403 errors from remote providers (#3762 ) ## Summary Fixes #2990 Remote provider authentication errors (401/403) were being converted to 500 Internal Server Error, preventing users from understanding why their requests failed. ## The Problem When a request with an invalid API key was sent to a remote provider: - Provider correctly returns 401 with error details - Llama Stack's `translate_exception()` didn't recognize provider SDK exceptions - Fell through to generic 500 error handler - User received: "Internal server error: An unexpected error occurred." ## The Fix Added handler in `translate_exception()` that checks for exceptions with a `status_code` attribute and preserves the original HTTP status code and error message. Before: ```json HTTP 500 {"detail": "Internal server error: An unexpected error occurred."} ``` After: ```json HTTP 401 {"detail": "Error code: 401 - {'error': {'message': 'Invalid API Key', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}"} ``` ## Tested With - ✅ groq: 401 "Invalid API Key" - ✅ openai: 401 "Incorrect API key provided" - ✅ together: 401 "Invalid API key provided" - ✅ fireworks: 403 "unauthorized" ## Test Plan Automated test script: https://gist.github.com/ashwinb/1199dd7585ffa3f4be67b111cc65f2f3 The test script: 1. Builds separate stacks for each provider 2. Registers models (with validation temporarily disabled for testing) 3. Sends requests with invalid API keys via `x-llamastack-provider-data` header 4. Verifies HTTP status codes are 401/403 (not 500) Results before fix: All providers returned 500 Results after fix: All providers correctly return 401/403 Manual verification: ```bash # 1. Build stack llama stack build --image-type venv --providers inference=remote::groq # 2. Start stack llama stack run # 3. Send request with invalid API key curl http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -H 'x-llamastack-provider-data: {"groq_api_key": "invalid-key"}' \ -d '{"model": "groq/llama3-70b-8192", "messages": [{"role": "user", "content": "test"}]}' # Expected: HTTP 401 with provider error message (not 500) ``` ## Impact - Works with all remote providers using OpenAI SDK (groq, openai, together, fireworks, etc.) - Works with any provider SDK that follows the pattern of exceptions with `status_code` attribute - No breaking changes - only affects error responses	2025-10-09 18:34:39 -07:00
Eric Huang	ae81baa4cc	merge commit for archive created by Sapling	2025-10-09 17:28:54 -07:00
Eric Huang	972f2395a1	test # What does this PR do? ## Test Plan	2025-10-09 17:28:45 -07:00
Matthew Farrellee	145b2bcf25	feat: make object registration idempotent (#3752 ) # What does this PR do? objects (vector dbs, models, scoring functions, etc) have an identifier and associated object values. we allow exact duplicate registrations. we reject registrations when the identifier exists and the associated object values differ. note: model are namespaced, i.e. {provider_id}/{identifier}, while other object types are not ## Test Plan ci w/ new tests	2025-10-09 17:04:28 -07:00
Eric Huang	c03ce82eda	merge commit for archive created by Sapling	2025-10-09 16:55:49 -07:00
Eric Huang	a4238222a3	test # What does this PR do? ## Test Plan	2025-10-09 16:55:35 -07:00
Sébastien Han	7ee0ee7843	chore!: remove model mgmt from CLI for Hugging Face CLI (#3700 ) This change removes the `llama model` and `llama download` subcommands from the CLI, replacing them with recommendations to use the Hugging Face CLI instead. Rationale for this change: - The model management functionality was largely duplicating what Hugging Face CLI already provides, leading to unnecessary maintenance overhead (except the download source from Meta?) - Maintaining our own implementation required fixing bugs and keeping up with changes in model repositories and download mechanisms - The Hugging Face CLI is more mature, widely adopted, and better maintained - This allows us to focus on the core Llama Stack functionality rather than reimplementing model management tools Changes made: - Removed all model-related CLI commands and their implementations - Updated documentation to recommend using `huggingface-cli` for model downloads - Removed Meta-specific download logic and statements - Simplified the CLI to focus solely on stack management operations Users should now use: - `huggingface-cli download` for downloading models - `huggingface-cli scan-cache` for listing downloaded models This is a breaking change as it removes previously available CLI commands. Signed-off-by: Sébastien Han <seb@redhat.com>	2025-10-09 16:50:33 -07:00
Ashwin Bharambe	841d0c3583	fix(testing): improve api_recorder error messages for missing recordings (#3760 ) Replaces opaque error messages when recordings are not found with somewhat better guidance Before: ``` No recorded response found for request hash: abc123... To record this response, run with LLAMA_STACK_TEST_INFERENCE_MODE=record ``` After: ``` Recording not found for request hash: abc123 Model: gpt-4 \| Request: POST https://api.openai.com/v1/chat/completions Run './scripts/integration-tests.sh --inference-mode record-if-missing' with required API keys to generate. ```	2025-10-09 15:04:16 -07:00
Ashwin Bharambe	a055a32ee4	fix(tests): remove chroma and qdrant from vector io unit tests (#3759 ) These vector databases are already thoroughly tested in integration tests. Unit tests now focus on sqlite_vec, faiss, and pgvector with mocked dependencies, removing the need for external service dependencies. ## Changes: - Deleted test_qdrant.py unit test file - Removed chroma/qdrant fixtures and parametrization from conftest.py - Fixed SqliteKVStoreConfig import to use correct location - Removed chromadb, qdrant-client, pymilvus, milvus-lite, and weaviate-client from unit test dependencies in pyproject.toml	2025-10-09 14:36:34 -07:00
Ashwin Bharambe	f50ce11a3b	feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403 ) Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ```	2025-10-09 14:27:51 -07:00
ehhuang	9e70492078	Merge `a93130e323` into sapling-pr-archive-ehhuang	2025-10-09 13:53:45 -07:00
Eric Huang	a93130e323	test # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan Completes the refactoring started in previous commit by: 1. Fix library client (critical): Add logic to detect Pydantic model parameters and construct them properly from request bodies. The key fix is to NOT exclude any params when converting the body for Pydantic models - we need all fields to pass to the Pydantic constructor. Before: _convert_body excluded all params, leaving body empty for Pydantic construction After: Check for Pydantic params first, skip exclusion, construct model with full body 2. Update remaining providers to use new Pydantic-based signatures: - litellm_openai_mixin: Extract extra fields via __pydantic_extra__ - databricks: Use TYPE_CHECKING import for params type - llama_openai_compat: Use TYPE_CHECKING import for params type - sentence_transformers: Update method signatures to use params 3. Update unit tests to use new Pydantic signature: - test_openai_mixin.py: Use OpenAIChatCompletionRequestParams This fixes test failures where the library client was trying to construct Pydantic models with empty dictionaries. The previous fix had a bug: it called _convert_body() which only keeps fields that match function parameter names. For Pydantic methods with signature: openai_chat_completion(params: OpenAIChatCompletionRequestParams) The signature only has 'params', but the body has 'model', 'messages', etc. So _convert_body() returned an empty dict. Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body directly to construct the Pydantic model (after stripping NOT_GIVENs). This properly fixes the ValidationError where required fields were missing. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True.	2025-10-09 13:53:33 -07:00
Eric Huang	3dfa114aac	merge commit for archive created by Sapling	2025-10-09 13:53:28 -07:00
Eric Huang	a70fc60485	test # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan Completes the refactoring started in previous commit by: 1. Fix library client (critical): Add logic to detect Pydantic model parameters and construct them properly from request bodies. The key fix is to NOT exclude any params when converting the body for Pydantic models - we need all fields to pass to the Pydantic constructor. Before: _convert_body excluded all params, leaving body empty for Pydantic construction After: Check for Pydantic params first, skip exclusion, construct model with full body 2. Update remaining providers to use new Pydantic-based signatures: - litellm_openai_mixin: Extract extra fields via __pydantic_extra__ - databricks: Use TYPE_CHECKING import for params type - llama_openai_compat: Use TYPE_CHECKING import for params type - sentence_transformers: Update method signatures to use params 3. Update unit tests to use new Pydantic signature: - test_openai_mixin.py: Use OpenAIChatCompletionRequestParams This fixes test failures where the library client was trying to construct Pydantic models with empty dictionaries. The previous fix had a bug: it called _convert_body() which only keeps fields that match function parameter names. For Pydantic methods with signature: openai_chat_completion(params: OpenAIChatCompletionRequestParams) The signature only has 'params', but the body has 'model', 'messages', etc. So _convert_body() returned an empty dict. Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body directly to construct the Pydantic model (after stripping NOT_GIVENs). This properly fixes the ValidationError where required fields were missing. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True.	2025-10-09 13:53:18 -07:00
grs	26fd5dbd34	fix: add traces for tool calls and mcp tool listing (#3722 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details UI Tests / ui-tests (22) (push) Successful in 42s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? Adds traces around tool execution and mcp tool listing for better observability. Closes #3108 ## Test Plan Manually examined traces in jaeger to verify the added information was available. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-10-09 09:59:09 -07:00
Eric Huang	9e9a827fcd	client sync # What does this PR do? ## Test Plan	2025-10-09 09:32:03 -07:00
Sébastien Han	4b9ebbf6a2	chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750 ) Reverts llamastack/llama-stack#3624 Causing https://github.com/llamastack/llama-stack/issues/3749	2025-10-09 09:17:37 -04:00
ehhuang	05a62a6ffb	chore: print integration tests command (#3747 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m23s Details # What does this PR do? ## Test Plan <img width="1104" height="60" alt="image" src="https://github.com/user-attachments/assets/d4691a2e-c5ec-4df5-a15a-f86e667fdf8c" />	2025-10-08 15:12:13 -07:00
Eric Huang	d525a438fb	merge commit for archive created by Sapling	2025-10-08 15:02:40 -07:00
Eric Huang	c76bf97ccf	test, recording # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-08 14:57:06 -07:00
ehhuang	6cd8eea4ea	Merge `6b07f43f61` into sapling-pr-archive-ehhuang	2025-10-08 14:43:20 -07:00
Eric Huang	6b07f43f61	chore: print integration tests command # What does this PR do? ## Test Plan	2025-10-08 14:43:08 -07:00
Eric Huang	1ef14650c4	merge commit for archive created by Sapling	2025-10-08 14:30:33 -07:00
Eric Huang	258b6486e8	test, recording # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-08 14:30:23 -07:00
Ashwin Bharambe	16db42e7e5	feat(tests): add --collect-only option to integration test script (#3745 ) Adds --collect-only flag to scripts/integration-tests.sh that skips server startup and passes the flag to pytest for test collection only. When specified, minimal flags are required (no --stack-config or --setup needed). ## Changes - Added `--collect-only` flag that skips server startup - Made `--stack-config` and `--setup` optional when using `--collect-only` - Skip `llama` command check when collecting tests only ## Usage ```bash # Collect tests without starting server ./scripts/integration-tests.sh --subdirs inference --collect-only ```	2025-10-08 14:20:34 -07:00
ehhuang	ee0152fc07	Merge `521009048a` into sapling-pr-archive-ehhuang	2025-10-08 13:54:26 -07:00
Eric Huang	521009048a	test # What does this PR do? ## Test Plan	2025-10-08 13:54:21 -07:00
ehhuang	3c58803efa	Merge `0424e33172` into sapling-pr-archive-ehhuang	2025-10-08 13:46:51 -07:00
Eric Huang	0424e33172	test # What does this PR do? ## Test Plan	2025-10-08 13:46:46 -07:00
ehhuang	02890c22f3	Merge `001bf15bf8` into sapling-pr-archive-ehhuang	2025-10-08 13:38:59 -07:00
Eric Huang	001bf15bf8	test # What does this PR do? ## Test Plan	2025-10-08 13:38:54 -07:00
ehhuang	3e9dd56af8	Merge `f229c433fe` into sapling-pr-archive-ehhuang	2025-10-08 13:29:39 -07:00
Eric Huang	f229c433fe	test # What does this PR do? ## Test Plan	2025-10-08 13:29:34 -07:00
ehhuang	5025e02d81	Merge `1e891489a8` into sapling-pr-archive-ehhuang	2025-10-08 13:23:35 -07:00
Eric Huang	1e891489a8	test # What does this PR do? ## Test Plan	2025-10-08 13:23:21 -07:00
Francisco Arceo	b96640eca3	chore: Removing Weaviate, PGVector, and Milvus from unit tests (#3742 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details UI Tests / ui-tests (22) (push) Successful in 48s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? Removing Weaviate, PostGres, and Milvus unit tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-08 12:25:51 -07:00
Ashwin Bharambe	79bed44b04	fix(tests): ensure test isolation in server mode (#3737 ) Propagate test IDs from client to server via HTTP headers to maintain proper test isolation when running with server-based stack configs. Without this, recorded/replayed inference requests in server mode would leak across tests. Changes: - Patch client _prepare_request to inject test ID into provider data header - Sync test context from provider data on server side before storage operations - Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config - Configure console width for cleaner log output in CI - Add SQLITE_STORE_DIR temp directory for test data isolation	2025-10-08 12:03:36 -07:00
ehhuang	08d46d6363	Merge `ed4e452de0` into sapling-pr-archive-ehhuang	2025-10-08 11:39:41 -07:00
Eric Huang	ed4e452de0	chore!: remove ALL telemetry APIs # What does this PR do? ## Test Plan	2025-10-08 11:39:30 -07:00
grs	96886afaca	fix(responses): fix regression in support for mcp tool require_approval argument (#3731 ) # What does this PR do? It prevents a tool call message being added to the chat completions message without a corresponding tool call result, which is needed in the case that an approval is required first or if the approval request is denied. In both these cases the tool call messages is popped of the next turn messages. Closes #3728 ## Test Plan Ran the integration tests Manual check of both approval and denial against gpt-4o Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-10-08 10:47:17 -04:00

1 2 3 4 5 ...

3027 commits