llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-05 18:27:22 +00:00

Author	SHA1	Message	Date
Eric Huang	237ba78995	merge commit for archive created by Sapling	2025-10-09 20:53:29 -07:00
Eric Huang	4a3d1e33f8	test # What does this PR do? ## Test Plan	2025-10-09 20:53:21 -07:00
Eric Huang	ae81baa4cc	merge commit for archive created by Sapling	2025-10-09 17:28:54 -07:00
Eric Huang	972f2395a1	test # What does this PR do? ## Test Plan	2025-10-09 17:28:45 -07:00
Eric Huang	c03ce82eda	merge commit for archive created by Sapling	2025-10-09 16:55:49 -07:00
Eric Huang	a4238222a3	test # What does this PR do? ## Test Plan	2025-10-09 16:55:35 -07:00
Ashwin Bharambe	f50ce11a3b	feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403 ) Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ```	2025-10-09 14:27:51 -07:00
ehhuang	9e70492078	Merge `a93130e323` into sapling-pr-archive-ehhuang	2025-10-09 13:53:45 -07:00
Eric Huang	a93130e323	test # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan Completes the refactoring started in previous commit by: 1. Fix library client (critical): Add logic to detect Pydantic model parameters and construct them properly from request bodies. The key fix is to NOT exclude any params when converting the body for Pydantic models - we need all fields to pass to the Pydantic constructor. Before: _convert_body excluded all params, leaving body empty for Pydantic construction After: Check for Pydantic params first, skip exclusion, construct model with full body 2. Update remaining providers to use new Pydantic-based signatures: - litellm_openai_mixin: Extract extra fields via __pydantic_extra__ - databricks: Use TYPE_CHECKING import for params type - llama_openai_compat: Use TYPE_CHECKING import for params type - sentence_transformers: Update method signatures to use params 3. Update unit tests to use new Pydantic signature: - test_openai_mixin.py: Use OpenAIChatCompletionRequestParams This fixes test failures where the library client was trying to construct Pydantic models with empty dictionaries. The previous fix had a bug: it called _convert_body() which only keeps fields that match function parameter names. For Pydantic methods with signature: openai_chat_completion(params: OpenAIChatCompletionRequestParams) The signature only has 'params', but the body has 'model', 'messages', etc. So _convert_body() returned an empty dict. Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body directly to construct the Pydantic model (after stripping NOT_GIVENs). This properly fixes the ValidationError where required fields were missing. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True.	2025-10-09 13:53:33 -07:00
Eric Huang	3dfa114aac	merge commit for archive created by Sapling	2025-10-09 13:53:28 -07:00
Eric Huang	a70fc60485	test # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan Completes the refactoring started in previous commit by: 1. Fix library client (critical): Add logic to detect Pydantic model parameters and construct them properly from request bodies. The key fix is to NOT exclude any params when converting the body for Pydantic models - we need all fields to pass to the Pydantic constructor. Before: _convert_body excluded all params, leaving body empty for Pydantic construction After: Check for Pydantic params first, skip exclusion, construct model with full body 2. Update remaining providers to use new Pydantic-based signatures: - litellm_openai_mixin: Extract extra fields via __pydantic_extra__ - databricks: Use TYPE_CHECKING import for params type - llama_openai_compat: Use TYPE_CHECKING import for params type - sentence_transformers: Update method signatures to use params 3. Update unit tests to use new Pydantic signature: - test_openai_mixin.py: Use OpenAIChatCompletionRequestParams This fixes test failures where the library client was trying to construct Pydantic models with empty dictionaries. The previous fix had a bug: it called _convert_body() which only keeps fields that match function parameter names. For Pydantic methods with signature: openai_chat_completion(params: OpenAIChatCompletionRequestParams) The signature only has 'params', but the body has 'model', 'messages', etc. So _convert_body() returned an empty dict. Fix: Skip _convert_body() entirely for Pydantic params. Use the raw body directly to construct the Pydantic model (after stripping NOT_GIVENs). This properly fixes the ValidationError where required fields were missing. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True. The streaming code path (_call_streaming) had the same issue as non-streaming: it called _convert_body() which returned empty dict for Pydantic params. Applied the same fix as commit 7476c0ae: - Detect Pydantic model parameters before body conversion - Skip _convert_body() for Pydantic params - Construct Pydantic model directly from raw body (after stripping NOT_GIVENs) This fixes streaming endpoints like openai_chat_completion with stream=True.	2025-10-09 13:53:18 -07:00
grs	26fd5dbd34	fix: add traces for tool calls and mcp tool listing (#3722 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 4s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.13) (push) Failing after 5s Details API Conformance Tests / check-schema-compatibility (push) Successful in 15s Details UI Tests / ui-tests (22) (push) Successful in 42s Details Pre-commit / pre-commit (push) Successful in 1m24s Details # What does this PR do? Adds traces around tool execution and mcp tool listing for better observability. Closes #3108 ## Test Plan Manually examined traces in jaeger to verify the added information was available. Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-10-09 09:59:09 -07:00
Eric Huang	9e9a827fcd	client sync # What does this PR do? ## Test Plan	2025-10-09 09:32:03 -07:00
Sébastien Han	4b9ebbf6a2	chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750 ) Reverts llamastack/llama-stack#3624 Causing https://github.com/llamastack/llama-stack/issues/3749	2025-10-09 09:17:37 -04:00
ehhuang	05a62a6ffb	chore: print integration tests command (#3747 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.13) (push) Failing after 1s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 9s Details UI Tests / ui-tests (22) (push) Successful in 41s Details Pre-commit / pre-commit (push) Successful in 1m23s Details # What does this PR do? ## Test Plan <img width="1104" height="60" alt="image" src="https://github.com/user-attachments/assets/d4691a2e-c5ec-4df5-a15a-f86e667fdf8c" />	2025-10-08 15:12:13 -07:00
Eric Huang	d525a438fb	merge commit for archive created by Sapling	2025-10-08 15:02:40 -07:00
Eric Huang	c76bf97ccf	test, recording # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-08 14:57:06 -07:00
ehhuang	6cd8eea4ea	Merge `6b07f43f61` into sapling-pr-archive-ehhuang	2025-10-08 14:43:20 -07:00
Eric Huang	6b07f43f61	chore: print integration tests command # What does this PR do? ## Test Plan	2025-10-08 14:43:08 -07:00
Eric Huang	1ef14650c4	merge commit for archive created by Sapling	2025-10-08 14:30:33 -07:00
Eric Huang	258b6486e8	test, recording # What does this PR do? ## Test Plan # What does this PR do? ## Test Plan	2025-10-08 14:30:23 -07:00
Ashwin Bharambe	16db42e7e5	feat(tests): add --collect-only option to integration test script (#3745 ) Adds --collect-only flag to scripts/integration-tests.sh that skips server startup and passes the flag to pytest for test collection only. When specified, minimal flags are required (no --stack-config or --setup needed). ## Changes - Added `--collect-only` flag that skips server startup - Made `--stack-config` and `--setup` optional when using `--collect-only` - Skip `llama` command check when collecting tests only ## Usage ```bash # Collect tests without starting server ./scripts/integration-tests.sh --subdirs inference --collect-only ```	2025-10-08 14:20:34 -07:00
ehhuang	ee0152fc07	Merge `521009048a` into sapling-pr-archive-ehhuang	2025-10-08 13:54:26 -07:00
Eric Huang	521009048a	test # What does this PR do? ## Test Plan	2025-10-08 13:54:21 -07:00
ehhuang	3c58803efa	Merge `0424e33172` into sapling-pr-archive-ehhuang	2025-10-08 13:46:51 -07:00
Eric Huang	0424e33172	test # What does this PR do? ## Test Plan	2025-10-08 13:46:46 -07:00
ehhuang	02890c22f3	Merge `001bf15bf8` into sapling-pr-archive-ehhuang	2025-10-08 13:38:59 -07:00
Eric Huang	001bf15bf8	test # What does this PR do? ## Test Plan	2025-10-08 13:38:54 -07:00
ehhuang	3e9dd56af8	Merge `f229c433fe` into sapling-pr-archive-ehhuang	2025-10-08 13:29:39 -07:00
Eric Huang	f229c433fe	test # What does this PR do? ## Test Plan	2025-10-08 13:29:34 -07:00
ehhuang	5025e02d81	Merge `1e891489a8` into sapling-pr-archive-ehhuang	2025-10-08 13:23:35 -07:00
Eric Huang	1e891489a8	test # What does this PR do? ## Test Plan	2025-10-08 13:23:21 -07:00
Francisco Arceo	b96640eca3	chore: Removing Weaviate, PGVector, and Milvus from unit tests (#3742 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details API Conformance Tests / check-schema-compatibility (push) Successful in 11s Details UI Tests / ui-tests (22) (push) Successful in 48s Details Pre-commit / pre-commit (push) Successful in 1m27s Details # What does this PR do? Removing Weaviate, PostGres, and Milvus unit tests <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>	2025-10-08 12:25:51 -07:00
Ashwin Bharambe	79bed44b04	fix(tests): ensure test isolation in server mode (#3737 ) Propagate test IDs from client to server via HTTP headers to maintain proper test isolation when running with server-based stack configs. Without this, recorded/replayed inference requests in server mode would leak across tests. Changes: - Patch client _prepare_request to inject test ID into provider data header - Sync test context from provider data on server side before storage operations - Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config - Configure console width for cleaner log output in CI - Add SQLITE_STORE_DIR temp directory for test data isolation	2025-10-08 12:03:36 -07:00
ehhuang	08d46d6363	Merge `ed4e452de0` into sapling-pr-archive-ehhuang	2025-10-08 11:39:41 -07:00
Eric Huang	ed4e452de0	chore!: remove ALL telemetry APIs # What does this PR do? ## Test Plan	2025-10-08 11:39:30 -07:00
grs	96886afaca	fix(responses): fix regression in support for mcp tool require_approval argument (#3731 ) # What does this PR do? It prevents a tool call message being added to the chat completions message without a corresponding tool call result, which is needed in the case that an approval is required first or if the approval request is denied. In both these cases the tool call messages is popped of the next turn messages. Closes #3728 ## Test Plan Ran the integration tests Manual check of both approval and denial against gpt-4o Signed-off-by: Gordon Sim <gsim@redhat.com>	2025-10-08 10:47:17 -04:00
Bill Murdock	5d711d4bcb	fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674 ) Some checks failed SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Python Package Build Test / build (3.12) (push) Failing after 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 7s Details Test Llama Stack Build / generate-matrix (push) Successful in 6s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s Details Test External API and Providers / test-external (venv) (push) Failing after 4s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 12s Details Test Llama Stack Build / build (push) Failing after 3s Details Unit Tests / unit-tests (3.12) (push) Failing after 5s Details UI Tests / ui-tests (22) (push) Successful in 32s Details Pre-commit / pre-commit (push) Successful in 1m29s Details # What does this PR do? - The watsonx.ai provider now uses the LiteLLM mixin instead of using IBM's library, which does not seem to be working (see #3165 for context). - The watsonx.ai provider now lists all the models available by calling the watsonx.ai server instead of having a hard coded list of known models. (That list gets out of date quickly) - An edge case in [llama_stack/core/routers/inference.py](https://github.com/llamastack/llama-stack/pull/3674/files#diff-a34bc966ed9befd9f13d4883c23705dff49be0ad6211c850438cdda6113f3455) is addressed that was causing my manual tests to fail. - Fixes `b64_encode_openai_embeddings_response` which was trying to enumerate over a dictionary and then reference elements of the dictionary using .field instead of ["field"]. That method is called by the LiteLLM mixin for embedding models, so it is needed to get the watsonx.ai embedding models to work. - A unit test along the lines of the one in #3348 is added. A more comprehensive plan for automatically testing the end-to-end functionality for inference providers would be a good idea, but is out of scope for this PR. - Updates to the watsonx distribution. Some were in response to the switch to LiteLLM (e.g., updating the Python packages needed). Others seem to be things that were already broken that I found along the way (e.g., a reference to a watsonx specific doc template that doesn't seem to exist). Closes #3165 Also it is related to a line-item in #3387 but doesn't really address that goal (because it uses the LiteLLM mixin, not the OpenAI one). I tried the OpenAI one and it doesn't work with watsonx.ai, presumably because the watsonx.ai service is not OpenAI compatible. It works with LiteLLM because LiteLLM has a provider implementation for watsonx.ai. ## Test Plan The test script below goes back and forth between the OpenAI and watsonx providers. The idea is that the OpenAI provider shows how it should work and then the watsonx provider output shows that it is also working with watsonx. Note that the result from the MCP test is not as good (the Llama 3.3 70b model does not choose tools as wisely as gpt-4o), but it is still working and providing a valid response. For more details on setup and the MCP server being used for testing, see [the AI Alliance sample notebook](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/) that these examples are drawn from. ```python #!/usr/bin/env python3 import json from llama_stack_client import LlamaStackClient from litellm import completion import http.client def print_response(response): """Print response in a nicely formatted way""" print(f"ID: {response.id}") print(f"Status: {response.status}") print(f"Model: {response.model}") print(f"Created at: {response.created_at}") print(f"Output items: {len(response.output)}") for i, output_item in enumerate(response.output): if len(response.output) > 1: print(f"\n--- Output Item {i+1} ---") print(f"Output type: {output_item.type}") if output_item.type in ("text", "message"): print(f"Response content: {output_item.content[0].text}") elif output_item.type == "file_search_call": print(f" Tool Call ID: {output_item.id}") print(f" Tool Status: {output_item.status}") # 'queries' is a list, so we join it for clean printing print(f" Queries: {', '.join(output_item.queries)}") # Display results if they exist, otherwise note they are empty print(f" Results: {output_item.results if output_item.results else 'None'}") elif output_item.type == "mcp_list_tools": print_mcp_list_tools(output_item) elif output_item.type == "mcp_call": print_mcp_call(output_item) else: print(f"Response content: {output_item.content}") def print_mcp_call(mcp_call): """Print MCP call in a nicely formatted way""" print(f"\n🛠️ MCP Tool Call: {mcp_call.name}") print(f" Server: {mcp_call.server_label}") print(f" ID: {mcp_call.id}") print(f" Arguments: {mcp_call.arguments}") if mcp_call.error: print("Error: {mcp_call.error}") elif mcp_call.output: print("Output:") # Try to format JSON output nicely try: parsed_output = json.loads(mcp_call.output) print(json.dumps(parsed_output, indent=4)) except: # If not valid JSON, print as-is print(f" {mcp_call.output}") else: print(" ⏳ No output yet") def print_mcp_list_tools(mcp_list_tools): """Print MCP list tools in a nicely formatted way""" print(f"\n🔧 MCP Server: {mcp_list_tools.server_label}") print(f" ID: {mcp_list_tools.id}") print(f" Available Tools: {len(mcp_list_tools.tools)}") print("=" * 80) for i, tool in enumerate(mcp_list_tools.tools, 1): print(f"\n{i}. {tool.name}") print(f" Description: {tool.description}") # Parse and display input schema schema = tool.input_schema if schema and 'properties' in schema: properties = schema['properties'] required = schema.get('required', []) print(" Parameters:") for param_name, param_info in properties.items(): param_type = param_info.get('type', 'unknown') param_desc = param_info.get('description', 'No description') required_marker = " (required)" if param_name in required else " (optional)" print(f" • {param_name} ({param_type}){required_marker}") if param_desc: print(f" {param_desc}") if i < len(mcp_list_tools.tools): print("-" * 40) def main(): """Main function to run all the tests""" # Configuration LLAMA_STACK_URL = "http://localhost:8321/" LLAMA_STACK_MODEL_IDS = [ "openai/gpt-3.5-turbo", "openai/gpt-4o", "llama-openai-compat/Llama-3.3-70B-Instruct", "watsonx/meta-llama/llama-3-3-70b-instruct" ] # Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml. OPENAI_MODEL_ID = LLAMA_STACK_MODEL_IDS[1] WATSONX_MODEL_ID = LLAMA_STACK_MODEL_IDS[-1] NPS_MCP_URL = "http://localhost:3005/sse/" print("=== Llama Stack Testing Script ===") print(f"Using OpenAI model: {OPENAI_MODEL_ID}") print(f"Using WatsonX model: {WATSONX_MODEL_ID}") print(f"MCP URL: {NPS_MCP_URL}") print() # Initialize client print("Initializing LlamaStackClient...") client = LlamaStackClient(base_url="http://localhost:8321") # Test 1: List models print("\n=== Test 1: List Models ===") try: models = client.models.list() print(f"Found {len(models)} models") except Exception as e: print(f"Error listing models: {e}") raise e # Test 2: Basic chat completion with OpenAI print("\n=== Test 2: Basic Chat Completion (OpenAI) ===") try: chat_completion_response = client.chat.completions.create( model=OPENAI_MODEL_ID, messages=[{"role": "user", "content": "What is the capital of France?"}] ) print("OpenAI Response:") for chunk in chat_completion_response.choices[0].message.content: print(chunk, end="", flush=True) print() except Exception as e: print(f"Error with OpenAI chat completion: {e}") raise e # Test 3: Basic chat completion with WatsonX print("\n=== Test 3: Basic Chat Completion (WatsonX) ===") try: chat_completion_response_wxai = client.chat.completions.create( model=WATSONX_MODEL_ID, messages=[{"role": "user", "content": "What is the capital of France?"}], ) print("WatsonX Response:") for chunk in chat_completion_response_wxai.choices[0].message.content: print(chunk, end="", flush=True) print() except Exception as e: print(f"Error with WatsonX chat completion: {e}") raise e # Test 4: Tool calling with OpenAI print("\n=== Test 4: Tool Calling (OpenAI) ===") tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather for a specific location", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g., San Francisco, CA", }, "unit": { "type": "string", "enum": ["celsius", "fahrenheit"] }, }, "required": ["location"], }, }, } ] messages = [ {"role": "user", "content": "What's the weather like in Boston, MA?"} ] try: print("--- Initial API Call ---") response = client.chat.completions.create( model=OPENAI_MODEL_ID, messages=messages, tools=tools, tool_choice="auto", # "auto" is the default ) print("OpenAI tool calling response received") except Exception as e: print(f"Error with OpenAI tool calling: {e}") raise e # Test 5: Tool calling with WatsonX print("\n=== Test 5: Tool Calling (WatsonX) ===") try: wxai_response = client.chat.completions.create( model=WATSONX_MODEL_ID, messages=messages, tools=tools, tool_choice="auto", # "auto" is the default ) print("WatsonX tool calling response received") except Exception as e: print(f"Error with WatsonX tool calling: {e}") raise e # Test 6: Streaming with WatsonX print("\n=== Test 6: Streaming Response (WatsonX) ===") try: chat_completion_response_wxai_stream = client.chat.completions.create( model=WATSONX_MODEL_ID, messages=[{"role": "user", "content": "What is the capital of France?"}], stream=True ) print("Model response: ", end="") for chunk in chat_completion_response_wxai_stream: # Each 'chunk' is a ChatCompletionChunk object. # We want the content from the 'delta' attribute. if hasattr(chunk, 'choices') and chunk.choices is not None: content = chunk.choices[0].delta.content # The first few chunks might have None content, so we check for it. if content is not None: print(content, end="", flush=True) print() except Exception as e: print(f"Error with streaming: {e}") raise e # Test 7: MCP with OpenAI print("\n=== Test 7: MCP Integration (OpenAI) ===") try: mcp_llama_stack_client_response = client.responses.create( model=OPENAI_MODEL_ID, input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.", tools=[ { "type": "mcp", "server_url": NPS_MCP_URL, "server_label": "National Parks Service tools", "allowed_tools": ["search_parks", "get_park_events"], } ] ) print_response(mcp_llama_stack_client_response) except Exception as e: print(f"Error with MCP (OpenAI): {e}") raise e # Test 8: MCP with WatsonX print("\n=== Test 8: MCP Integration (WatsonX) ===") try: mcp_llama_stack_client_response = client.responses.create( model=WATSONX_MODEL_ID, input="What is the capital of France?" ) print_response(mcp_llama_stack_client_response) except Exception as e: print(f"Error with MCP (WatsonX): {e}") raise e # Test 9: MCP with Llama 3.3 print("\n=== Test 9: MCP Integration (Llama 3.3) ===") try: mcp_llama_stack_client_response = client.responses.create( model=WATSONX_MODEL_ID, input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.", tools=[ { "type": "mcp", "server_url": NPS_MCP_URL, "server_label": "National Parks Service tools", "allowed_tools": ["search_parks", "get_park_events"], } ] ) print_response(mcp_llama_stack_client_response) except Exception as e: print(f"Error with MCP (Llama 3.3): {e}") raise e # Test 10: Embeddings print("\n=== Test 10: Embeddings ===") try: conn = http.client.HTTPConnection("localhost:8321") payload = json.dumps({ "model": "watsonx/ibm/granite-embedding-278m-multilingual", "input": "Hello, world!", }) headers = { 'Content-Type': 'application/json', 'Accept': 'application/json' } conn.request("POST", "/v1/openai/v1/embeddings", payload, headers) res = conn.getresponse() data = res.read() print(data.decode("utf-8")) except Exception as e: print(f"Error with Embeddings: {e}") raise e print("\n=== Testing Complete ===") if __name__ == "__main__": main() ``` --------- Signed-off-by: Bill Murdock <bmurdock@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-08 07:29:43 -04:00
dependabot[bot]	62bac0aad4	chore(github-deps): bump actions/stale from 10.0.0 to 10.1.0 (#3684 ) Bumps [actions/stale](https://github.com/actions/stale) from 10.0.0 to 10.1.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/stale/releases">actions/stale's releases</a>.</em></p> <blockquote> <h2>v10.1.0</h2> <h2>What's Changed</h2> <ul> <li>Add <code>only-issue-types</code> option to filter issues by type by <a href="https://github.com/Bibo-Joshi"><code>@Bibo-Joshi</code></a> in <a href="https://redirect.github.com/actions/stale/pull/1255">actions/stale#1255</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/Bibo-Joshi"><code>@Bibo-Joshi</code></a> made their first contribution in <a href="https://redirect.github.com/actions/stale/pull/1255">actions/stale#1255</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/stale/compare/v10...v10.1.0">https://github.com/actions/stale/compare/v10...v10.1.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`5f858e3efb`"><code>5f858e3</code></a> Add <code>only-issue-types</code> option to filter issues by type (<a href="https://redirect.github.com/actions/stale/issues/1255">#1255</a>)</li> <li>See full diff in <a href="`3a9db7e6a4...5f858e3efb`">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/stale&package-manager=github_actions&previous-version=10.0.0&new-version=10.1.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2025-10-08 12:16:54 +02:00
Omar Abdelwahab	702fcd1abf	fix: Raising an error message to the user when registering an existing provider. (#3624 ) When the user wants to change the attributes (which could include model name, dimensions,...etc) of an already registered provider, they will get an error message asking that they first unregister the provider before registering a new one. # What does this PR do? This PR updated the register function to raise an error to the user when they attempt to register a provider that was already registered asking them to un-register the existing provider first. <!-- If resolving an issue, uncomment and update the line below --> #2313 ## Test Plan Tested the change with /tests/unit/registry/test_registry.py --------- Co-authored-by: Omar Abdelwahab <omara@fb.com>	2025-10-08 12:09:23 +02:00
ehhuang	0cde3d956d	chore: require valid logging category (#3712 ) # What does this PR do? grep'd and audited all usage of 'get_logger' with help of Claude. ## Test Plan CI	2025-10-08 11:10:33 +02:00
ehhuang	a3f5072776	chore!: remove --env from `llama stack run` (#3711 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s Details Installer CI / lint (push) Failing after 2s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Installer CI / smoke-test-on-dev (push) Failing after 2s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s Details Test Llama Stack Build / generate-matrix (push) Successful in 3s Details Vector IO Integration Tests / test-matrix (push) Failing after 4s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s Details Test Llama Stack Build / build-single-provider (push) Failing after 4s Details Python Package Build Test / build (3.12) (push) Failing after 2s Details Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s Details Python Package Build Test / build (3.13) (push) Failing after 1s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.12) (push) Failing after 3s Details Test Llama Stack Build / build (push) Failing after 3s Details Test External API and Providers / test-external (venv) (push) Failing after 3s Details Unit Tests / unit-tests (3.13) (push) Failing after 3s Details UI Tests / ui-tests (22) (push) Successful in 40s Details Pre-commit / pre-commit (push) Successful in 1m18s Details # What does this PR do? user can simply set env vars in the beginning of the command.`FOO=BAR llama stack run ...` ## Test Plan Run TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711). * #3714 * __->__ #3711	2025-10-07 20:58:15 -07:00
slekkala1	1ac320b7e6	chore: remove dead code (#3729 ) # What does this PR do? Removing some dead code, found by vulture and checked by claude that there are no references or imports for these ## Test Plan CI	2025-10-07 20:26:02 -07:00
ehhuang	b6e9f41041	chore: Revert "fix: fix nvidia provider (#3716 )" (#3730 ) This reverts commit `c940fe7938`. @wukaixingxp I stamped to fast. Let's wait for @mattf's review.	2025-10-07 19:16:51 -07:00
ehhuang	75690a7cc6	Merge `9e61a4ab8c` into sapling-pr-archive-ehhuang	2025-10-07 19:08:53 -07:00
ehhuang	9e61a4ab8c	chore: Revert "fix: fix nvidia provider (#3716 )" This reverts commit `c940fe7938`.	2025-10-07 19:08:44 -07:00
Kai Wu	c940fe7938	fix: fix nvidia provider (#3716 ) # What does this PR do? (Used claude to solve #3715, coded with claude but tested by me) ## From claude summary: <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Problem: The `NVIDIAInferenceAdapter` class was missing the `alias_to_provider_id_map` attribute, which caused the error: `ERROR 'NVIDIAInferenceAdapter' object has no attribute 'alias_to_provider_id_map'` Root Cause: The `NVIDIAInferenceAdapter` only inherited from `OpenAIMixin`, but some parts of the system expected it to have the `alias_to_provider_id_map` attribute, which is provided by the `ModelRegistryHelper` class. Solution: 1. Added ModelRegistryHelper import: Imported the `ModelRegistryHelper` class from `llama_stack.providers.utils.inference.model_registry` 2. Updated inheritance: Changed the class declaration to inherit from both `OpenAIMixin` and `ModelRegistryHelper` 3. Added proper initialization: Added an `__init__` method that properly initializes the `ModelRegistryHelper` with empty model entries (since NVIDIA uses dynamic model discovery) and the allowed models from the configuration Key Changes: * Added `from llama_stack.providers.utils.inference.model_registry import ModelRegistryHelper` * Changed class declaration from `class NVIDIAInferenceAdapter(OpenAIMixin):` to `class NVIDIAInferenceAdapter(OpenAIMixin, ModelRegistryHelper):` * Added `__init__` method that calls `ModelRegistryHelper.__init__(self, model_entries=[], allowed_models=config.allowed_models)` The inheritance order is important - `OpenAIMixin` comes first to ensure its `check_model_availability()` method takes precedence over the `ModelRegistryHelper` version, as mentioned in the class documentation. This fix ensures that the `NVIDIAInferenceAdapter` has the required `alias_to_provider_id_map` attribute while maintaining all existing functionality.<!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Launching llama-stack server successfully, see logs: ``` NVIDIA_API_KEY=dummy NVIDIA_BASE_URL=http://localhost:8912 llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --image-type venv & [2] 3753042 (venv) nvidia@nv-meta-H100-testing-gpu01:~/kai/llama-stack$ WARNING 2025-10-07 00:29:09,848 root:266 uncategorized: Unknown logging category: openai::conversations. Falling back to default 'root' level: 20 WARNING 2025-10-07 00:29:09,932 root:266 uncategorized: Unknown logging category: cli. Falling back to default 'root' level: 20 INFO 2025-10-07 00:29:09,937 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:09,937 llama_stack.cli.stack.run:136 cli: Using run configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml Using virtual environment: /home/nvidia/kai/venv Virtual environment already activated + '[' -n /home/nvidia/.llama/distributions/starter/starter-run.yaml ']' + yaml_config_arg=/home/nvidia/.llama/distributions/starter/starter-run.yaml + llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --port 8321 WARNING 2025-10-07 00:29:11,432 root:266 uncategorized: Unknown logging category: openai::conversations. Falling back to default 'root' level: 20 WARNING 2025-10-07 00:29:11,593 root:266 uncategorized: Unknown logging category: cli. Falling back to default 'root' level: 20 INFO 2025-10-07 00:29:11,603 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,604 llama_stack.cli.stack.run:136 cli: Using run configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,624 llama_stack.cli.stack.run:155 cli: No image type or image name provided. Assuming environment packages. INFO 2025-10-07 00:29:11,625 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,644 llama_stack.cli.stack.run:230 cli: HTTPS enabled with certificates: Key: None Cert: None INFO 2025-10-07 00:29:11,645 llama_stack.cli.stack.run:232 cli: Listening on ['::', '0.0.0.0']:8321 INFO 2025-10-07 00:29:11,816 llama_stack.core.utils.config_resolution:45 core: Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml INFO 2025-10-07 00:29:11,836 llama_stack.core.server.server:480 core::server: Run configuration: INFO 2025-10-07 00:29:11,845 llama_stack.core.server.server:483 core::server: apis: - agents - batches - datasetio - eval - files - inference - post_training - safety - scoring - telemetry - tool_runtime - vector_io benchmarks: [] datasets: [] image_name: starter inference_store: db_path: /home/nvidia/.llama/distributions/starter/inference_store.db type: sqlite metadata_store: db_path: /home/nvidia/.llama/distributions/starter/registry.db type: sqlite models: [] providers: agents: - config: persistence_store: db_path: /home/nvidia/.llama/distributions/starter/agents_store.db type: sqlite responses_store: db_path: /home/nvidia/.llama/distributions/starter/responses_store.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference batches: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/batches.db type: sqlite provider_id: reference provider_type: inline::reference datasetio: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/huggingface_datasetio.db type: sqlite provider_id: huggingface provider_type: remote::huggingface - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/localfs_datasetio.db type: sqlite provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/meta_reference_eval.db type: sqlite provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: db_path: /home/nvidia/.llama/distributions/starter/files_metadata.db type: sqlite storage_dir: /home/nvidia/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '******' url: https://api.fireworks.ai/inference/v1 provider_id: fireworks provider_type: remote::fireworks - config: api_key: '****' url: https://api.together.xyz/v1 provider_id: together provider_type: remote::together - config: {} provider_id: bedrock provider_type: remote::bedrock - config: api_key: '****' append_api_version: true url: http://localhost:8912 provider_id: nvidia provider_type: remote::nvidia - config: api_key: '****' base_url: https://api.openai.com/v1 provider_id: openai provider_type: remote::openai - config: api_key: '****' provider_id: anthropic provider_type: remote::anthropic - config: api_key: '****' provider_id: gemini provider_type: remote::gemini - config: api_key: '****' url: https://api.groq.com provider_id: groq provider_type: remote::groq - config: api_key: '****' url: https://api.sambanova.ai/v1 provider_id: sambanova provider_type: remote::sambanova - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: meta provider_id: torchtune-cpu provider_type: inline::torchtune-cpu safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard - config: {} provider_id: code-scanner provider_type: inline::code-scanner scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '****' provider_id: braintrust provider_type: inline::braintrust telemetry: - config: service_name: "\u200B" sinks: sqlite sqlite_db_path: /home/nvidia/.llama/distributions/starter/trace_store.db provider_id: meta-reference provider_type: inline::meta-reference tool_runtime: - config: api_key: '****' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '******' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: kvstore: db_path: /home/nvidia/.llama/distributions/starter/faiss_store.db type: sqlite provider_id: faiss provider_type: inline::faiss - config: db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec.db kvstore: db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec_registry.db type: sqlite provider_id: sqlite-vec provider_type: inline::sqlite-vec scoring_fns: [] server: port: 8321 shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 INFO 2025-10-07 00:29:12,138 llama_stack.providers.remote.inference.nvidia.nvidia:49 inference::nvidia: Initializing NVIDIAInferenceAdapter(http://localhost:8912)... INFO 2025-10-07 00:29:12,921 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-10-07 00:29:13,524 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues ERROR 2025-10-07 00:29:13,679 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:13,681 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:13,682 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed with: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} WARNING 2025-10-07 00:29:13,684 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} Handling connection for 8912 INFO 2025-10-07 00:29:14,047 llama_stack.providers.utils.inference.openai_mixin:448 providers::utils: NVIDIAInferenceAdapter.list_provider_model_ids() returned 3 models ERROR 2025-10-07 00:29:14,062 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,063 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,099 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed with: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" WARNING 2025-10-07 00:29:14,100 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" ERROR 2025-10-07 00:29:14,102 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,103 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,105 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,106 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,107 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,109 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. INFO 2025-10-07 00:29:14,454 uvicorn.error:84 uncategorized: Started server process [3753046] INFO 2025-10-07 00:29:14,455 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-07 00:29:14,457 llama_stack.core.server.server:170 core::server: Starting up INFO 2025-10-07 00:29:14,458 llama_stack.core.stack:415 core: starting registry refresh task ERROR 2025-10-07 00:29:14,459 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,461 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,462 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed with: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} WARNING 2025-10-07 00:29:14,463 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header X-LlamaStack-Provider-Data as { "together_api_key": <your api key>} ERROR 2025-10-07 00:29:14,465 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,466 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config. INFO 2025-10-07 00:29:14,500 uvicorn.error:62 uncategorized: Application startup complete. ERROR 2025-10-07 00:29:14,502 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed with: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" WARNING 2025-10-07 00:29:14,503 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: "Could not resolve authentication method. Expected either api_key or auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly omitted" ERROR 2025-10-07 00:29:14,504 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,506 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,507 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,508 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config. ERROR 2025-10-07 00:29:14,510 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed with: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. WARNING 2025-10-07 00:29:14,511 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config. INFO 2025-10-07 00:29:14,513 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` tested with curl model, it also works: ``` curl http://localhost:8321/v1/models {"data":[{"identifier":"bedrock/meta.llama3-1-8b-instruct-v1:0","provider_resource_id":"meta.llama3-1-8b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-70b-instruct-v1:0","provider_resource_id":"meta.llama3-1-70b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-405b-instruct-v1:0","provider_resource_id":"meta.llama3-1-405b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/bigcode/starcoder2-7b","provider_resource_id":"bigcode/starcoder2-7b","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/meta/llama-3.3-70b-instruct","provider_resource_id":"meta/llama-3.3-70b-instruct","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2","provider_resource_id":"nvidia/llama-3.2-nv-embedqa-1b-v2","provider_id":"nvidia","type":"model","metadata":{"embedding_dimension":2048,"context_length":8192},"model_type":"embedding"},{"identifier":"sentence-transformers/all-MiniLM-L6-v2","provider_resource_id":"all-MiniLM-L6-v2","provider_id":"sentence-transformers","type":"model","metadata":{"embedding_dimension":384},"model_type":"embedding"}]}% ``` --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>	2025-10-07 18:23:12 -07:00
Emilio Garcia	bc7d4b423b	fix(scripts): select container runtime for telemetry (#3727 ) # What does this PR do? script runs with either docker or podman ## Test Plan passes when run	2025-10-07 14:59:53 -07:00
slekkala1	c2d97a9db9	chore: fix flaky unit test and add proper shutdown for file batches (#3725 ) # What does this PR do? Have been running into flaky unit test failures: `5217035494` Fixing below 1. Shutting down properly by cancelling any stale file batches tasks running in background. 2. Also, use unique_kvstore_config, so the test dont use same db path and maintain test isolation ## Test Plan Ran unit test locally and CI	2025-10-07 14:23:14 -07:00
Akram Ben Aissi	1970b4aa4b	fix: improve model availability checks: Allows use of unavailable models on startup (#3717 ) Some checks failed SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s Details Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s Details Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s Details Python Package Build Test / build (3.13) (push) Failing after 2s Details Vector IO Integration Tests / test-matrix (push) Failing after 5s Details Unit Tests / unit-tests (3.12) (push) Failing after 4s Details API Conformance Tests / check-schema-compatibility (push) Successful in 10s Details Unit Tests / unit-tests (3.13) (push) Failing after 4s Details Test External API and Providers / test-external (venv) (push) Failing after 7s Details UI Tests / ui-tests (22) (push) Successful in 39s Details Pre-commit / pre-commit (push) Successful in 1m28s Details - Allows use of unavailable models on startup - Add has_model method to ModelsRoutingTable for checking pre-registered models - Update check_model_availability to check model_store before provider APIs # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Start llama stack and point unavailable vLLM ``` VLLM_URL=https://my-unavailable-vllm/v1 MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run ``` llama stack will start without crashing but only notifying error. ``` - provider_id: rag-runtime toolgroup_id: builtin::rag vector_dbs: [] version: 2 INFO 2025-10-07 06:40:41,804 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues INFO 2025-10-07 06:40:42,066 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues ERROR 2025-10-07 06:40:58,882 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out. WARNING 2025-10-07 06:40:58,883 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out. [...] INFO 2025-10-07 06:40:59,036 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) INFO 2025-10-07 06:41:04,064 openai._base_client:1618 uncategorized: Retrying request to /models in 0.398814 seconds INFO 2025-10-07 06:41:09,497 openai._base_client:1618 uncategorized: Retrying request to /models in 0.781908 seconds ERROR 2025-10-07 06:41:15,282 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out. WARNING 2025-10-07 06:41:15,283 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out. ```	2025-10-07 14:27:24 -04:00

1 2 3 4 5 ...

2964 commits