Commit graph

1733 commits

Author SHA1 Message Date
Swapna Lekkala
b5c951fa4b clean up 2025-10-10 09:39:59 -07:00
Swapna Lekkala
e09401805f feat: Add responses and safety impl with extra body 2025-10-10 09:39:26 -07:00
Ashwin Bharambe
548ccff368
fix(mypy): fix wrong attribute access (#3770) 2025-10-10 09:30:43 -07:00
grs
8bf07f91cb
feat: reuse previous mcp tool listings where possible (#3710)
# What does this PR do?
This PR checks whether, if a previous response is linked, there are
mcp_list_tools objects that can be reused instead of listing the tools
explicitly every time.

 Closes #3106 

## Test Plan
Tested manually.
Added unit tests to cover new behaviour.

---------

Signed-off-by: Gordon Sim <gsim@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-10 09:28:25 -07:00
Matthew Farrellee
0066d986c5
feat: use SecretStr for inference provider auth credentials (#3724)
# What does this PR do?

use SecretStr for OpenAIMixin providers

- RemoteInferenceProviderConfig now has auth_credential: SecretStr
- the default alias is api_key (most common name)
- some providers override to use api_token (RunPod, vLLM, Databricks)
- some providers exclude it (Ollama, TGI, Vertex AI)

addresses #3517 

## Test Plan

ci w/ new tests
2025-10-10 07:32:50 -07:00
Ashwin Bharambe
e039b61d26
feat(responses)!: add in_progress, failed, content part events (#3765)
## Summary
- add schema + runtime support for response.in_progress /
response.failed / response.incomplete
- stream content parts with proper indexes and reasoning slots
- align tests + docs with the richer event payloads

## Testing
- uv run pytest
tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input
- uv run pytest
tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py
2025-10-10 07:27:34 -07:00
Akram Ben Aissi
a548169b99
fix: allow skipping model availability check for vLLM (#3739)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Allows model check to fail gracefully instead of crashing on startup.


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

set VLLM_URL to your VLLM server 

```
(base) akram@Mac llama-stack % LAMA_STACK_LOGGING="all=debug" VLLM_ENABLE_MODEL_DISCOVERY=false  MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter  --image-type venv --run
```



```

INFO     2025-10-08 20:11:24,637 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-08 20:11:24,866 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-08 20:11:26,160 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=9fba207425
         5851c718aca717a5887d76%3A%2Fmodels">Found</a>.
         
[...]
INFO     2025-10-08 20:11:26,295 uvicorn.error:84 uncategorized: Started server process [83144]
INFO     2025-10-08 20:11:26,296 uvicorn.error:48 uncategorized: Waiting for application startup.
INFO     2025-10-08 20:11:26,297 llama_stack.core.server.server:170 core::server: Starting up
INFO     2025-10-08 20:11:26,297 llama_stack.core.stack:399 core: starting registry refresh task
INFO     2025-10-08 20:11:26,311 uvicorn.error:62 uncategorized: Application startup complete.
INFO     2025-10-08 20:11:26,312 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
ERROR    2025-10-08 20:11:26,791 llama_stack.providers.utils.inference.openai_mixin:439 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: <a
         href="https://oauth.akram.a1ey.p3.openshiftapps.com:443/oauth/authorize?approval_prompt=force&amp;client_id=system%3Aserviceaccount%3Arhoai-30-genai%3Adefault&amp;redirect_uri=ht
         tps%3A%2F%2Fvllm-rhoai-30-genai.apps.rosa.akram.a1ey.p3.openshiftapps.com%2Foauth%2Fcallback&amp;response_type=code&amp;scope=user%3Ainfo+user%3Acheck-access&amp;state=8ef0cba3e1
         71a4f8b04cb445cfb91a4c%3A%2Fmodels">Found</a>.

```
2025-10-10 07:23:13 -07:00
Ashwin Bharambe
aaf5036235
feat(responses): add usage types to inference and responses APIs (#3764)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 23s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 27s
API Conformance Tests / check-schema-compatibility (push) Successful in 36s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 2m7s
## Summary
Adds OpenAI-compatible usage tracking types to enable reporting token
consumption for both streaming and non-streaming responses.

## Type Definitions
**Chat Completion Usage** (inference API):
```python
class OpenAIChatCompletionUsage(BaseModel):
    prompt_tokens: int
    completion_tokens: int
    total_tokens: int
    prompt_tokens_details: OpenAIChatCompletionUsagePromptTokensDetails | None
    completion_tokens_details: OpenAIChatCompletionUsageCompletionTokensDetails | None
```

**Response Usage** (responses API):
```python
class OpenAIResponseUsage(BaseModel):
    input_tokens: int
    output_tokens: int
    total_tokens: int
    input_tokens_details: OpenAIResponseUsageInputTokensDetails | None
    output_tokens_details: OpenAIResponseUsageOutputTokensDetails | None
```

This matches OpenAI's usage reporting format and enables PR #3766 to
implement usage tracking in streaming responses.

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-10 09:22:59 -04:00
Ashwin Bharambe
ebae0385bb
fix: update dangling references to llama download command (#3763)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 2m14s
## Summary
After removing model management CLI in #3700, this PR updates remaining
references to the old `llama download` command to use `huggingface-cli
download` instead.

## Changes
- Updated error messages in `meta_reference/common.py` to recommend
`huggingface-cli download`
- Updated error messages in
`torchtune/recipes/lora_finetuning_single_device.py` to use
`huggingface-cli download`
- Updated post-training notebook to use `huggingface-cli download`
instead of `llama download`
- Fixed typo: "you model" -> "your model"

## Test Plan
- Verified error messages provide correct guidance for users
- Checked that notebook instructions are up-to-date with current tooling
2025-10-09 18:35:02 -07:00
Ashwin Bharambe
8fe4a216b5
fix(inference): propagate 401/403 errors from remote providers (#3762)
## Summary
Fixes #2990

Remote provider authentication errors (401/403) were being converted to
500 Internal Server Error, preventing users from understanding why their
requests failed.

## The Problem
When a request with an invalid API key was sent to a remote provider:
- Provider correctly returns 401 with error details
- Llama Stack's `translate_exception()` didn't recognize provider SDK
exceptions
- Fell through to generic 500 error handler
- User received: "Internal server error: An unexpected error occurred."

## The Fix
Added handler in `translate_exception()` that checks for exceptions with
a `status_code` attribute and preserves the original HTTP status code
and error message.

**Before:**
```json
HTTP 500
{"detail": "Internal server error: An unexpected error occurred."}
```

**After:**
```json
HTTP 401
{"detail": "Error code: 401 - {'error': {'message': 'Invalid API Key', 'type': 'invalid_request_error', 'code': 'invalid_api_key'}}"}
```

## Tested With
-  groq: 401 "Invalid API Key"  
-  openai: 401 "Incorrect API key provided"
-  together: 401 "Invalid API key provided"
-  fireworks: 403 "unauthorized"

## Test Plan

**Automated test script:**
https://gist.github.com/ashwinb/1199dd7585ffa3f4be67b111cc65f2f3

The test script:
1. Builds separate stacks for each provider
2. Registers models (with validation temporarily disabled for testing)
3. Sends requests with invalid API keys via `x-llamastack-provider-data`
header
4. Verifies HTTP status codes are 401/403 (not 500)

**Results before fix:** All providers returned 500  
**Results after fix:** All providers correctly return 401/403

**Manual verification:**
```bash
# 1. Build stack
llama stack build --image-type venv --providers inference=remote::groq

# 2. Start stack
llama stack run

# 3. Send request with invalid API key
curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H 'x-llamastack-provider-data: {"groq_api_key": "invalid-key"}' \
  -d '{"model": "groq/llama3-70b-8192", "messages": [{"role": "user", "content": "test"}]}'

# Expected: HTTP 401 with provider error message (not 500)
```

## Impact
- Works with all remote providers using OpenAI SDK (groq, openai,
together, fireworks, etc.)
- Works with any provider SDK that follows the pattern of exceptions
with `status_code` attribute
- No breaking changes - only affects error responses
2025-10-09 18:34:39 -07:00
Matthew Farrellee
145b2bcf25
feat: make object registration idempotent (#3752)
# What does this PR do?

objects (vector dbs, models, scoring functions, etc) have an identifier
and associated object values.

we allow exact duplicate registrations.

we reject registrations when the identifier exists and the associated
object values differ.

note: model are namespaced, i.e. {provider_id}/{identifier}, while other
object types are not

## Test Plan

ci w/ new tests
2025-10-09 17:04:28 -07:00
Sébastien Han
7ee0ee7843
chore!: remove model mgmt from CLI for Hugging Face CLI (#3700)
This change removes the `llama model` and `llama download` subcommands
from the CLI, replacing them with recommendations to use the Hugging
Face CLI instead.

Rationale for this change:
- The model management functionality was largely duplicating what
Hugging Face CLI already provides, leading to unnecessary maintenance
overhead (except the download source from Meta?)
- Maintaining our own implementation required fixing bugs and keeping up
with changes in model repositories and download mechanisms
- The Hugging Face CLI is more mature, widely adopted, and better
maintained
- This allows us to focus on the core Llama Stack functionality rather
than reimplementing model management tools

Changes made:
- Removed all model-related CLI commands and their implementations
- Updated documentation to recommend using `huggingface-cli` for model
downloads
- Removed Meta-specific download logic and statements
- Simplified the CLI to focus solely on stack management operations

Users should now use:
- `huggingface-cli download` for downloading models
- `huggingface-cli scan-cache` for listing downloaded models

This is a breaking change as it removes previously available CLI
commands.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-10-09 16:50:33 -07:00
Ashwin Bharambe
841d0c3583
fix(testing): improve api_recorder error messages for missing recordings (#3760)
Replaces opaque error messages when recordings are not found with
somewhat better guidance

Before:
```
No recorded response found for request hash: abc123...
To record this response, run with LLAMA_STACK_TEST_INFERENCE_MODE=record
```

After:
```
Recording not found for request hash: abc123
Model: gpt-4 | Request: POST https://api.openai.com/v1/chat/completions

Run './scripts/integration-tests.sh --inference-mode record-if-missing' with required API keys to generate.
```
2025-10-09 15:04:16 -07:00
Ashwin Bharambe
f50ce11a3b
feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403)
Renames `inference_recorder.py` to `api_recorder.py` and extends it to
support recording/replaying tool invocations in addition to inference
calls.

This allows us to record web-search, etc. tool calls and thereafter
apply recordings for `tests/integration/responses`

## Test Plan

```
export OPENAI_API_KEY=...
export TAVILY_SEARCH_API_KEY=...

./scripts/integration-tests.sh --stack-config ci-tests \
   --suite responses --inference-mode record-if-missing
```
2025-10-09 14:27:51 -07:00
grs
26fd5dbd34
fix: add traces for tool calls and mcp tool listing (#3722)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
Adds traces around tool execution and mcp tool listing for better
observability.

Closes #3108 

## Test Plan
Manually examined traces in jaeger to verify the added information was
available.

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-10-09 09:59:09 -07:00
Sébastien Han
4b9ebbf6a2
chore: revert "fix: Raising an error message to the user when registering an existing provider." (#3750)
Reverts llamastack/llama-stack#3624
Causing https://github.com/llamastack/llama-stack/issues/3749
2025-10-09 09:17:37 -04:00
Ashwin Bharambe
79bed44b04
fix(tests): ensure test isolation in server mode (#3737)
Propagate test IDs from client to server via HTTP headers to maintain
proper test isolation when running with server-based stack configs.
Without
this, recorded/replayed inference requests in server mode would leak
across
tests.

Changes:
- Patch client _prepare_request to inject test ID into provider data
header
- Sync test context from provider data on server side before storage
operations
- Set LLAMA_STACK_TEST_STACK_CONFIG_TYPE env var based on stack config
- Configure console width for cleaner log output in CI
- Add SQLITE_STORE_DIR temp directory for test data isolation
2025-10-08 12:03:36 -07:00
grs
96886afaca
fix(responses): fix regression in support for mcp tool require_approval argument (#3731)
# What does this PR do?

It prevents a tool call message being added to the chat completions
message without a corresponding tool call result, which is needed in the
case that an approval is required first or if the approval request is
denied. In both these cases the tool call messages is popped of the next
turn messages.

Closes #3728

## Test Plan
Ran the integration tests
Manual check of both approval and denial against gpt-4o

Signed-off-by: Gordon Sim <gsim@redhat.com>
2025-10-08 10:47:17 -04:00
Bill Murdock
5d711d4bcb
fix: Update watsonx.ai provider to use LiteLLM mixin and list all models (#3674)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 7s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 32s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do?

- The watsonx.ai provider now uses the LiteLLM mixin instead of using
IBM's library, which does not seem to be working (see #3165 for
context).
- The watsonx.ai provider now lists all the models available by calling
the watsonx.ai server instead of having a hard coded list of known
models. (That list gets out of date quickly)
- An edge case in
[llama_stack/core/routers/inference.py](https://github.com/llamastack/llama-stack/pull/3674/files#diff-a34bc966ed9befd9f13d4883c23705dff49be0ad6211c850438cdda6113f3455)
is addressed that was causing my manual tests to fail.
- Fixes `b64_encode_openai_embeddings_response` which was trying to
enumerate over a dictionary and then reference elements of the
dictionary using .field instead of ["field"]. That method is called by
the LiteLLM mixin for embedding models, so it is needed to get the
watsonx.ai embedding models to work.
- A unit test along the lines of the one in #3348 is added. A more
comprehensive plan for automatically testing the end-to-end
functionality for inference providers would be a good idea, but is out
of scope for this PR.
- Updates to the watsonx distribution. Some were in response to the
switch to LiteLLM (e.g., updating the Python packages needed). Others
seem to be things that were already broken that I found along the way
(e.g., a reference to a watsonx specific doc template that doesn't seem
to exist).

Closes #3165

Also it is related to a line-item in #3387 but doesn't really address
that goal (because it uses the LiteLLM mixin, not the OpenAI one). I
tried the OpenAI one and it doesn't work with watsonx.ai, presumably
because the watsonx.ai service is not OpenAI compatible. It works with
LiteLLM because LiteLLM has a provider implementation for watsonx.ai.

## Test Plan

The test script below goes back and forth between the OpenAI and watsonx
providers. The idea is that the OpenAI provider shows how it should work
and then the watsonx provider output shows that it is also working with
watsonx. Note that the result from the MCP test is not as good (the
Llama 3.3 70b model does not choose tools as wisely as gpt-4o), but it
is still working and providing a valid response. For more details on
setup and the MCP server being used for testing, see [the AI Alliance
sample
notebook](https://github.com/The-AI-Alliance/llama-stack-examples/blob/main/notebooks/01-responses/)
that these examples are drawn from.

```python
#!/usr/bin/env python3

import json
from llama_stack_client import LlamaStackClient
from litellm import completion
import http.client


def print_response(response):
    """Print response in a nicely formatted way"""
    print(f"ID: {response.id}")
    print(f"Status: {response.status}")
    print(f"Model: {response.model}")
    print(f"Created at: {response.created_at}")
    print(f"Output items: {len(response.output)}")
    
    for i, output_item in enumerate(response.output):
        if len(response.output) > 1:
            print(f"\n--- Output Item {i+1} ---")
        print(f"Output type: {output_item.type}")
        
        if output_item.type in ("text", "message"):
            print(f"Response content: {output_item.content[0].text}")
        elif output_item.type == "file_search_call":
            print(f"  Tool Call ID: {output_item.id}")
            print(f"  Tool Status: {output_item.status}")
            # 'queries' is a list, so we join it for clean printing
            print(f"  Queries: {', '.join(output_item.queries)}")
            # Display results if they exist, otherwise note they are empty
            print(f"  Results: {output_item.results if output_item.results else 'None'}")
        elif output_item.type == "mcp_list_tools":
            print_mcp_list_tools(output_item)
        elif output_item.type == "mcp_call":
            print_mcp_call(output_item)
        else:
            print(f"Response content: {output_item.content}")


def print_mcp_call(mcp_call):
    """Print MCP call in a nicely formatted way"""
    print(f"\n🛠️  MCP Tool Call: {mcp_call.name}")
    print(f"   Server: {mcp_call.server_label}")
    print(f"   ID: {mcp_call.id}")
    print(f"   Arguments: {mcp_call.arguments}")
    
    if mcp_call.error:
        print("Error: {mcp_call.error}")
    elif mcp_call.output:
        print("Output:")
        # Try to format JSON output nicely
        try:
            parsed_output = json.loads(mcp_call.output)
            print(json.dumps(parsed_output, indent=4))
        except:
            # If not valid JSON, print as-is
            print(f"   {mcp_call.output}")
    else:
        print("    No output yet")


def print_mcp_list_tools(mcp_list_tools):
    """Print MCP list tools in a nicely formatted way"""
    print(f"\n🔧 MCP Server: {mcp_list_tools.server_label}")
    print(f"   ID: {mcp_list_tools.id}")
    print(f"   Available Tools: {len(mcp_list_tools.tools)}")
    print("=" * 80)
    
    for i, tool in enumerate(mcp_list_tools.tools, 1):
        print(f"\n{i}. {tool.name}")
        print(f"   Description: {tool.description}")
        
        # Parse and display input schema
        schema = tool.input_schema
        if schema and 'properties' in schema:
            properties = schema['properties']
            required = schema.get('required', [])
            
            print("   Parameters:")
            for param_name, param_info in properties.items():
                param_type = param_info.get('type', 'unknown')
                param_desc = param_info.get('description', 'No description')
                required_marker = " (required)" if param_name in required else " (optional)"
                print(f"     • {param_name} ({param_type}){required_marker}")
                if param_desc:
                    print(f"       {param_desc}")
        
        if i < len(mcp_list_tools.tools):
            print("-" * 40)


def main():
    """Main function to run all the tests"""
    
    # Configuration
    LLAMA_STACK_URL = "http://localhost:8321/"
    LLAMA_STACK_MODEL_IDS = [
        "openai/gpt-3.5-turbo",
        "openai/gpt-4o",
        "llama-openai-compat/Llama-3.3-70B-Instruct",
        "watsonx/meta-llama/llama-3-3-70b-instruct"
    ]
    
    # Using gpt-4o for this demo, but feel free to try one of the others or add more to run.yaml.
    OPENAI_MODEL_ID = LLAMA_STACK_MODEL_IDS[1]
    WATSONX_MODEL_ID = LLAMA_STACK_MODEL_IDS[-1]
    NPS_MCP_URL = "http://localhost:3005/sse/"
    
    print("=== Llama Stack Testing Script ===")
    print(f"Using OpenAI model: {OPENAI_MODEL_ID}")
    print(f"Using WatsonX model: {WATSONX_MODEL_ID}")
    print(f"MCP URL: {NPS_MCP_URL}")
    print()
    
    # Initialize client
    print("Initializing LlamaStackClient...")
    client = LlamaStackClient(base_url="http://localhost:8321")
    
    # Test 1: List models
    print("\n=== Test 1: List Models ===")
    try:
        models = client.models.list()
        print(f"Found {len(models)} models")
    except Exception as e:
        print(f"Error listing models: {e}")
        raise e
    
    # Test 2: Basic chat completion with OpenAI
    print("\n=== Test 2: Basic Chat Completion (OpenAI) ===")
    try:
        chat_completion_response = client.chat.completions.create(
            model=OPENAI_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}]
        )
        
        print("OpenAI Response:")
        for chunk in chat_completion_response.choices[0].message.content:
            print(chunk, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with OpenAI chat completion: {e}")
        raise e
    
    # Test 3: Basic chat completion with WatsonX
    print("\n=== Test 3: Basic Chat Completion (WatsonX) ===")
    try:
        chat_completion_response_wxai = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}],
        )
        
        print("WatsonX Response:")
        for chunk in chat_completion_response_wxai.choices[0].message.content:
            print(chunk, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with WatsonX chat completion: {e}")
        raise e
    
    # Test 4: Tool calling with OpenAI
    print("\n=== Test 4: Tool Calling (OpenAI) ===")
    tools = [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather for a specific location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g., San Francisco, CA",
                        },
                        "unit": {
                            "type": "string",
                            "enum": ["celsius", "fahrenheit"]
                        },
                    },
                    "required": ["location"],
                },
            },
        }
    ]
    
    messages = [
        {"role": "user", "content": "What's the weather like in Boston, MA?"}
    ]
    
    try:
        print("--- Initial API Call ---")
        response = client.chat.completions.create(
            model=OPENAI_MODEL_ID,
            messages=messages,
            tools=tools,
            tool_choice="auto",  # "auto" is the default
        )
        print("OpenAI tool calling response received")
    except Exception as e:
        print(f"Error with OpenAI tool calling: {e}")
        raise e
    
    # Test 5: Tool calling with WatsonX
    print("\n=== Test 5: Tool Calling (WatsonX) ===")
    try:
        wxai_response = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=messages,
            tools=tools,
            tool_choice="auto",  # "auto" is the default
        )
        print("WatsonX tool calling response received")
    except Exception as e:
        print(f"Error with WatsonX tool calling: {e}")
        raise e
    
    # Test 6: Streaming with WatsonX
    print("\n=== Test 6: Streaming Response (WatsonX) ===")
    try:
        chat_completion_response_wxai_stream = client.chat.completions.create(
            model=WATSONX_MODEL_ID,
            messages=[{"role": "user", "content": "What is the capital of France?"}],
            stream=True
        )
        print("Model response: ", end="")
        for chunk in chat_completion_response_wxai_stream:
            # Each 'chunk' is a ChatCompletionChunk object.
            # We want the content from the 'delta' attribute.
            if hasattr(chunk, 'choices') and chunk.choices is not None:
                content = chunk.choices[0].delta.content
                # The first few chunks might have None content, so we check for it.
                if content is not None:
                    print(content, end="", flush=True)
        print()
    except Exception as e:
        print(f"Error with streaming: {e}")
        raise e
    
    # Test 7: MCP with OpenAI
    print("\n=== Test 7: MCP Integration (OpenAI) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=OPENAI_MODEL_ID,
            input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
            tools=[
                {
                    "type": "mcp",
                    "server_url": NPS_MCP_URL,
                    "server_label": "National Parks Service tools",
                    "allowed_tools": ["search_parks", "get_park_events"],
                }
            ]
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (OpenAI): {e}")
        raise e
    
    # Test 8: MCP with WatsonX
    print("\n=== Test 8: MCP Integration (WatsonX) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=WATSONX_MODEL_ID,
            input="What is the capital of France?"
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (WatsonX): {e}")
        raise e
    
    # Test 9: MCP with Llama 3.3
    print("\n=== Test 9: MCP Integration (Llama 3.3) ===")
    try:
        mcp_llama_stack_client_response = client.responses.create(
            model=WATSONX_MODEL_ID,
            input="Tell me about some parks in Rhode Island, and let me know if there are any upcoming events at them.",
            tools=[
                {
                    "type": "mcp",
                    "server_url": NPS_MCP_URL,
                    "server_label": "National Parks Service tools",
                    "allowed_tools": ["search_parks", "get_park_events"],
                }
            ]
        )
        print_response(mcp_llama_stack_client_response)
    except Exception as e:
        print(f"Error with MCP (Llama 3.3): {e}")
        raise e
    
    # Test 10: Embeddings
    print("\n=== Test 10: Embeddings ===")
    try:
        conn = http.client.HTTPConnection("localhost:8321")
        payload = json.dumps({
            "model": "watsonx/ibm/granite-embedding-278m-multilingual",
            "input": "Hello, world!",
        })
        headers = {
            'Content-Type': 'application/json',
            'Accept': 'application/json'
        }
        conn.request("POST", "/v1/openai/v1/embeddings", payload, headers)
        res = conn.getresponse()
        data = res.read()
        print(data.decode("utf-8"))
    except Exception as e:
        print(f"Error with Embeddings: {e}")
        raise e

    print("\n=== Testing Complete ===")


if __name__ == "__main__":
    main()
```

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-08 07:29:43 -04:00
Omar Abdelwahab
702fcd1abf
fix: Raising an error message to the user when registering an existing provider. (#3624)
When the user wants to change the attributes (which could include model
name, dimensions,...etc) of an already registered provider, they will
get an error message asking that they first unregister the provider
before registering a new one.

# What does this PR do?
This PR updated the register function to raise an error to the user when
they attempt to register a provider that was already registered asking
them to un-register the existing provider first.

<!-- If resolving an issue, uncomment and update the line below -->
#2313

## Test Plan
Tested the change with /tests/unit/registry/test_registry.py

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
2025-10-08 12:09:23 +02:00
ehhuang
0cde3d956d
chore: require valid logging category (#3712)
# What does this PR do?
grep'd and audited all usage of 'get_logger' with help of Claude.

## Test Plan
CI
2025-10-08 11:10:33 +02:00
ehhuang
a3f5072776
chore!: remove --env from llama stack run (#3711)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / smoke-test-on-dev (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do?
user can simply set env vars in the beginning of the command.`FOO=BAR
llama stack run ...`

## Test Plan
Run
TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build
--distro=starter --image-type=venv --run




---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711).
* #3714
* __->__ #3711
2025-10-07 20:58:15 -07:00
slekkala1
1ac320b7e6
chore: remove dead code (#3729)
# What does this PR do?
Removing some dead code, found by vulture and checked by claude that
there are no references or imports for these


## Test Plan
CI
2025-10-07 20:26:02 -07:00
ehhuang
b6e9f41041
chore: Revert "fix: fix nvidia provider (#3716)" (#3730)
This reverts commit c940fe7938.

@wukaixingxp I stamped to fast. Let's wait for @mattf's review.
2025-10-07 19:16:51 -07:00
Kai Wu
c940fe7938
fix: fix nvidia provider (#3716)
# What does this PR do?
(Used claude to solve #3715, coded with claude but tested by me)
## From claude summary:
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
**Problem**: The `NVIDIAInferenceAdapter` class was missing the
`alias_to_provider_id_map` attribute, which caused the error:

`ERROR 'NVIDIAInferenceAdapter' object has no attribute
'alias_to_provider_id_map'`

**Root Cause**: The `NVIDIAInferenceAdapter` only inherited from
`OpenAIMixin`, but some parts of the system expected it to have the
`alias_to_provider_id_map` attribute, which is provided by the
`ModelRegistryHelper` class.

**Solution**:

1. **Added ModelRegistryHelper import**: Imported the
`ModelRegistryHelper` class from
`llama_stack.providers.utils.inference.model_registry`
2. **Updated inheritance**: Changed the class declaration to inherit
from both `OpenAIMixin` and `ModelRegistryHelper`
3. **Added proper initialization**: Added an `__init__` method that
properly initializes the `ModelRegistryHelper` with empty model entries
(since NVIDIA uses dynamic model discovery) and the allowed models from
the configuration

**Key Changes**:

* Added `from llama_stack.providers.utils.inference.model_registry
import ModelRegistryHelper`
* Changed class declaration from `class
NVIDIAInferenceAdapter(OpenAIMixin):` to `class
NVIDIAInferenceAdapter(OpenAIMixin, ModelRegistryHelper):`
* Added `__init__` method that calls `ModelRegistryHelper.__init__(self,
model_entries=[], allowed_models=config.allowed_models)`

The inheritance order is important - `OpenAIMixin` comes first to ensure
its `check_model_availability()` method takes precedence over the
`ModelRegistryHelper` version, as mentioned in the class documentation.

This fix ensures that the `NVIDIAInferenceAdapter` has the required
`alias_to_provider_id_map` attribute while maintaining all existing
functionality.<!-- If resolving an issue, uncomment and update the line
below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Launching llama-stack server successfully, see logs:
```
NVIDIA_API_KEY=dummy NVIDIA_BASE_URL=http://localhost:8912 llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --image-type venv &
[2] 3753042
(venv) nvidia@nv-meta-H100-testing-gpu01:~/kai/llama-stack$ WARNING  2025-10-07 00:29:09,848 root:266 uncategorized: Unknown logging category:
         openai::conversations. Falling back to default 'root' level: 20
WARNING  2025-10-07 00:29:09,932 root:266 uncategorized: Unknown logging category: cli.
         Falling back to default 'root' level: 20
INFO     2025-10-07 00:29:09,937 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:09,937 llama_stack.cli.stack.run:136 cli: Using run
         configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml
Using virtual environment: /home/nvidia/kai/venv
Virtual environment already activated
+ '[' -n /home/nvidia/.llama/distributions/starter/starter-run.yaml ']'
+ yaml_config_arg=/home/nvidia/.llama/distributions/starter/starter-run.yaml
+ llama stack run /home/nvidia/.llama/distributions/starter/starter-run.yaml --port 8321
WARNING  2025-10-07 00:29:11,432 root:266 uncategorized: Unknown logging category:
         openai::conversations. Falling back to default 'root' level: 20
WARNING  2025-10-07 00:29:11,593 root:266 uncategorized: Unknown logging category: cli.
         Falling back to default 'root' level: 20
INFO     2025-10-07 00:29:11,603 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,604 llama_stack.cli.stack.run:136 cli: Using run
         configuration: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,624 llama_stack.cli.stack.run:155 cli: No image type or
         image name provided. Assuming environment packages.
INFO     2025-10-07 00:29:11,625 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,644 llama_stack.cli.stack.run:230 cli: HTTPS enabled with
         certificates:
           Key: None
           Cert: None
INFO     2025-10-07 00:29:11,645 llama_stack.cli.stack.run:232 cli: Listening on ['::',
         '0.0.0.0']:8321
INFO     2025-10-07 00:29:11,816 llama_stack.core.utils.config_resolution:45 core:
         Using file path: /home/nvidia/.llama/distributions/starter/starter-run.yaml
INFO     2025-10-07 00:29:11,836 llama_stack.core.server.server:480 core::server: Run
         configuration:
INFO     2025-10-07 00:29:11,845 llama_stack.core.server.server:483 core::server: apis:
         - agents
         - batches
         - datasetio
         - eval
         - files
         - inference
         - post_training
         - safety
         - scoring
         - telemetry
         - tool_runtime
         - vector_io
         benchmarks: []
         datasets: []
         image_name: starter
         inference_store:
           db_path: /home/nvidia/.llama/distributions/starter/inference_store.db
           type: sqlite
         metadata_store:
           db_path: /home/nvidia/.llama/distributions/starter/registry.db
           type: sqlite
         models: []
         providers:
           agents:
           - config:
               persistence_store:
                 db_path: /home/nvidia/.llama/distributions/starter/agents_store.db
                 type: sqlite
               responses_store:
                 db_path: /home/nvidia/.llama/distributions/starter/responses_store.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           batches:
           - config:
               kvstore:
                 db_path: /home/nvidia/.llama/distributions/starter/batches.db
                 type: sqlite
             provider_id: reference
             provider_type: inline::reference
           datasetio:
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/huggingface_datasetio.db
                 type: sqlite
             provider_id: huggingface
             provider_type: remote::huggingface
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/localfs_datasetio.db
                 type: sqlite
             provider_id: localfs
             provider_type: inline::localfs
           eval:
           - config:
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/meta_reference_eval.db
                 type: sqlite
             provider_id: meta-reference
             provider_type: inline::meta-reference
           files:
           - config:
               metadata_store:
                 db_path: /home/nvidia/.llama/distributions/starter/files_metadata.db
                 type: sqlite
               storage_dir: /home/nvidia/.llama/distributions/starter/files
             provider_id: meta-reference-files
             provider_type: inline::localfs
           inference:
           - config:
               api_key: '********'
               url: https://api.fireworks.ai/inference/v1
             provider_id: fireworks
             provider_type: remote::fireworks
           - config:
               api_key: '********'
               url: https://api.together.xyz/v1
             provider_id: together
             provider_type: remote::together
           - config: {}
             provider_id: bedrock
             provider_type: remote::bedrock
           - config:
               api_key: '********'
               append_api_version: true
               url: http://localhost:8912
             provider_id: nvidia
             provider_type: remote::nvidia
           - config:
               api_key: '********'
               base_url: https://api.openai.com/v1
             provider_id: openai
             provider_type: remote::openai
           - config:
               api_key: '********'
             provider_id: anthropic
             provider_type: remote::anthropic
           - config:
               api_key: '********'
             provider_id: gemini
             provider_type: remote::gemini
           - config:
               api_key: '********'
               url: https://api.groq.com
             provider_id: groq
             provider_type: remote::groq
           - config:
               api_key: '********'
               url: https://api.sambanova.ai/v1
             provider_id: sambanova
             provider_type: remote::sambanova
           - config: {}
             provider_id: sentence-transformers
             provider_type: inline::sentence-transformers
           post_training:
           - config:
               checkpoint_format: meta
             provider_id: torchtune-cpu
             provider_type: inline::torchtune-cpu
           safety:
           - config:
               excluded_categories: []
             provider_id: llama-guard
             provider_type: inline::llama-guard
           - config: {}
             provider_id: code-scanner
             provider_type: inline::code-scanner
           scoring:
           - config: {}
             provider_id: basic
             provider_type: inline::basic
           - config: {}
             provider_id: llm-as-judge
             provider_type: inline::llm-as-judge
           - config:
               openai_api_key: '********'
             provider_id: braintrust
             provider_type: inline::braintrust
           telemetry:
           - config:
               service_name: "\u200B"
               sinks: sqlite
               sqlite_db_path: /home/nvidia/.llama/distributions/starter/trace_store.db
             provider_id: meta-reference
             provider_type: inline::meta-reference
           tool_runtime:
           - config:
               api_key: '********'
               max_results: 3
             provider_id: brave-search
             provider_type: remote::brave-search
           - config:
               api_key: '********'
               max_results: 3
             provider_id: tavily-search
             provider_type: remote::tavily-search
           - config: {}
             provider_id: rag-runtime
             provider_type: inline::rag-runtime
           - config: {}
             provider_id: model-context-protocol
             provider_type: remote::model-context-protocol
           vector_io:
           - config:
               kvstore:
                 db_path: /home/nvidia/.llama/distributions/starter/faiss_store.db
                 type: sqlite
             provider_id: faiss
             provider_type: inline::faiss
           - config:
               db_path: /home/nvidia/.llama/distributions/starter/sqlite_vec.db
               kvstore:
                 db_path:
         /home/nvidia/.llama/distributions/starter/sqlite_vec_registry.db
                 type: sqlite
             provider_id: sqlite-vec
             provider_type: inline::sqlite-vec
         scoring_fns: []
         server:
           port: 8321
         shields: []
         tool_groups:
         - provider_id: tavily-search
           toolgroup_id: builtin::websearch
         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2
INFO     2025-10-07 00:29:12,138
         llama_stack.providers.remote.inference.nvidia.nvidia:49 inference::nvidia:
         Initializing NVIDIAInferenceAdapter(http://localhost:8912)...
INFO     2025-10-07 00:29:12,921
         llama_stack.providers.utils.inference.inference_store:74 inference: Write
         queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 00:29:13,524
         llama_stack.providers.utils.responses.responses_store:96 openai_responses:
         Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 00:29:13,679 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:13,681 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider fireworks: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:13,682 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed
         with: Pass Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
WARNING  2025-10-07 00:29:13,684 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider together: Pass
         Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
Handling connection for 8912
INFO     2025-10-07 00:29:14,047 llama_stack.providers.utils.inference.openai_mixin:448
         providers::utils: NVIDIAInferenceAdapter.list_provider_model_ids() returned 3
         models
ERROR    2025-10-07 00:29:14,062 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,063 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider openai: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,099 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed
         with: "Could not resolve authentication method. Expected either api_key or
         auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers
         to be explicitly omitted"
WARNING  2025-10-07 00:29:14,100 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider anthropic: "Could not
         resolve authentication method. Expected either api_key or auth_token to be
         set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly
         omitted"
ERROR    2025-10-07 00:29:14,102 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,103 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider gemini: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,105 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with:
         API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in
         the provider config.
WARNING  2025-10-07 00:29:14,106 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider groq: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider
         config.
ERROR    2025-10-07 00:29:14,107 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,109 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider sambanova: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,454 uvicorn.error:84 uncategorized: Started server process
         [3753046]
INFO     2025-10-07 00:29:14,455 uvicorn.error:48 uncategorized: Waiting for
         application startup.
INFO     2025-10-07 00:29:14,457 llama_stack.core.server.server:170 core::server:
         Starting up
INFO     2025-10-07 00:29:14,458 llama_stack.core.stack:415 core: starting registry
         refresh task
ERROR    2025-10-07 00:29:14,459 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: FireworksInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,461 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider fireworks: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"fireworks_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,462 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: TogetherInferenceAdapter.list_provider_model_ids() failed
         with: Pass Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
WARNING  2025-10-07 00:29:14,463 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider together: Pass
         Together API Key in the header X-LlamaStack-Provider-Data as {
         "together_api_key": <your api key>}
ERROR    2025-10-07 00:29:14,465 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,466 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider openai: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,500 uvicorn.error:62 uncategorized: Application startup
         complete.
ERROR    2025-10-07 00:29:14,502 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: AnthropicInferenceAdapter.list_provider_model_ids() failed
         with: "Could not resolve authentication method. Expected either api_key or
         auth_token to be set. Or for one of the `X-Api-Key` or `Authorization` headers
         to be explicitly omitted"
WARNING  2025-10-07 00:29:14,503 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider anthropic: "Could not
         resolve authentication method. Expected either api_key or auth_token to be
         set. Or for one of the `X-Api-Key` or `Authorization` headers to be explicitly
         omitted"
ERROR    2025-10-07 00:29:14,504 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GeminiInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or
         in the provider config.
WARNING  2025-10-07 00:29:14,506 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider gemini: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the
         provider config.
ERROR    2025-10-07 00:29:14,507 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: GroqInferenceAdapter.list_provider_model_ids() failed with:
         API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in
         the provider config.
WARNING  2025-10-07 00:29:14,508 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider groq: API key is not
         set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider
         config.
ERROR    2025-10-07 00:29:14,510 llama_stack.providers.utils.inference.openai_mixin:439
         providers::utils: SambaNovaInferenceAdapter.list_provider_model_ids() failed
         with: API key is not set. Please provide a valid API key in the provider data
         header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"},
         or in the provider config.
WARNING  2025-10-07 00:29:14,511 llama_stack.core.routing_tables.models:36
         core::routing_tables: Model refresh failed for provider sambanova: API key is
         not set. Please provide a valid API key in the provider data header, e.g.
         x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the
         provider config.
INFO     2025-10-07 00:29:14,513 uvicorn.error:216 uncategorized: Uvicorn running on
         http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
```

tested with curl model, it also works:
```
curl http://localhost:8321/v1/models
{"data":[{"identifier":"bedrock/meta.llama3-1-8b-instruct-v1:0","provider_resource_id":"meta.llama3-1-8b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-70b-instruct-v1:0","provider_resource_id":"meta.llama3-1-70b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"bedrock/meta.llama3-1-405b-instruct-v1:0","provider_resource_id":"meta.llama3-1-405b-instruct-v1:0","provider_id":"bedrock","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/bigcode/starcoder2-7b","provider_resource_id":"bigcode/starcoder2-7b","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/meta/llama-3.3-70b-instruct","provider_resource_id":"meta/llama-3.3-70b-instruct","provider_id":"nvidia","type":"model","metadata":{},"model_type":"llm"},{"identifier":"nvidia/nvidia/llama-3.2-nv-embedqa-1b-v2","provider_resource_id":"nvidia/llama-3.2-nv-embedqa-1b-v2","provider_id":"nvidia","type":"model","metadata":{"embedding_dimension":2048,"context_length":8192},"model_type":"embedding"},{"identifier":"sentence-transformers/all-MiniLM-L6-v2","provider_resource_id":"all-MiniLM-L6-v2","provider_id":"sentence-transformers","type":"model","metadata":{"embedding_dimension":384},"model_type":"embedding"}]}%
```

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-07 18:23:12 -07:00
slekkala1
c2d97a9db9
chore: fix flaky unit test and add proper shutdown for file batches (#3725)
# What does this PR do?
Have been running into flaky unit test failures:
5217035494
Fixing below
1. Shutting down properly by cancelling any stale file batches tasks
running in background.
2. Also, use unique_kvstore_config, so the test dont use same db path
and maintain test isolation
## Test Plan
Ran unit test locally and CI
2025-10-07 14:23:14 -07:00
Akram Ben Aissi
1970b4aa4b
fix: improve model availability checks: Allows use of unavailable models on startup (#3717)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m28s
- Allows use of unavailable models on startup
- Add has_model method to ModelsRoutingTable for checking pre-registered
models
- Update check_model_availability to check model_store before provider
APIs

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->


Start llama stack and point unavailable vLLM

```
VLLM_URL=https://my-unavailable-vllm/v1 MILVUS_DB_PATH=./milvus.db INFERENCE_MODEL=vllm uv run --with llama-stack llama stack build --distro starter --image-type venv --run
```

llama stack will start without crashing but only notifying error. 

```


         - provider_id: rag-runtime
           toolgroup_id: builtin::rag
         vector_dbs: []
         version: 2

INFO     2025-10-07 06:40:41,804 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues
INFO     2025-10-07 06:40:42,066 llama_stack.providers.utils.responses.responses_store:96 openai_responses: Write queue disabled for SQLite to avoid concurrency issues
ERROR    2025-10-07 06:40:58,882 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:40:58,883 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
[...]
INFO     2025-10-07 06:40:59,036 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit)
INFO     2025-10-07 06:41:04,064 openai._base_client:1618 uncategorized: Retrying request to /models in 0.398814 seconds
INFO     2025-10-07 06:41:09,497 openai._base_client:1618 uncategorized: Retrying request to /models in 0.781908 seconds
ERROR    2025-10-07 06:41:15,282 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: VLLMInferenceAdapter.list_provider_model_ids() failed with: Request timed out.
WARNING  2025-10-07 06:41:15,283 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider vllm: Request timed out.
```
2025-10-07 14:27:24 -04:00
Francisco Arceo
d5b136ac66
feat: Enabling Annotations in Responses (#3698)
# What does this PR do?
Implements annotations for `file_search` tool.

Also adds some logs and tests.

## How does this work? 
1. **Citation Markers**: Models insert `<|file-id|>` tokens during
generation with instructions from search results
2. **Post-Processing**: Extract markers using regex to calculate
character positions and create `AnnotationFileCitation` objects
3. **File Mapping**: Store filename metadata during vector store
operations for proper citation display

## Example 
This is the updated `quickstart.py` script, which uses the `extra_body`
to register the embedding model.

```python
import io, requests
from openai import OpenAI

url="https://www.paulgraham.com/greatwork.html"
model = "gpt-4o-mini"
client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none")

vs = client.vector_stores.create(
    name="my_citations_db",
    extra_body={
        "embedding_model": "ollama/nomic-embed-text:latest",
        "embedding_dimension": 768,
    }
)
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id
client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id)

resp = client.responses.create(
    model=model,
    input="How do you do great work? Use our existing knowledge_search tool.",
    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
    include=["file_search_call.results"],
)

print(resp)
```

<details>
<summary> Example of the full response </summary>

```python
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_0f6f7e35-f48b-4850-8604-8117d9a50e0a/files "HTTP/1.1 200 OK"
INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK"
Response(id='resp-28f5793d-3272-4de3-81f6-8cbf107d5bcd', created_at=1759797954.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-4o-mini', object='response', output=[ResponseFileSearchToolCall(id='call_xWtvEQETN5GNiRLLiBIDKntg', queries=['how to do great work tips'], status='completed', type='file_search_call', results=[Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.3722624322210302, text='\\\'re looking where few have looked before.<br /><br />One sign that you\\\'re suited for some kind of work is when you like\\neven the parts that other people find tedious or frightening.<br /><br />But fields aren\\\'t people; you don\\\'t owe them any loyalty. If in the\\ncourse of working on one thing you discover another that\\\'s more\\nexciting, don\\\'t be afraid to switch.<br /><br />If you\\\'re making something for people, make sure it\\\'s something\\nthey actually want. The best way to do this is to make something\\nyou yourself want. Write the story you want to read; build the tool\\nyou want to use. Since your friends probably have similar interests,\\nthis will also get you your initial audience.<br /><br />This <i>should</i> follow from the excitingness rule. Obviously the most\\nexciting story to write will be the one you want to read. The reason\\nI mention this case explicitly is that so many people get it wrong.\\nInstead of making what they want, they try to make what some\\nimaginary, more sophisticated audience wants. And once you go down\\nthat route, you\\\'re lost.\\n<font color=#dddddd>[<a href="#f6n"><font color=#dddddd>6</font></a>]</font><br /><br />There are a lot of forces that will lead you astray when you\\\'re\\ntrying to figure out what to work on. Pretentiousness, fashion,\\nfear, money, politics, other people\\\'s wishes, eminent frauds. But\\nif you stick to what you find genuinely interesting, you\\\'ll be proof\\nagainst all of them. If you\\\'re interested, you\\\'re not astray.<br /><br /><br /><br /><br /><br />\\nFollowing your interests may sound like a rather passive strategy,\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.2532794607643494, text=' with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good colleagues\\noffer <i>surprising</i> insights. They can see and do things that you\\ncan\\\'t. So if you have a handful of colleagues good enough to keep\\nyou on your toes in this sense, you\\\'re probably over the threshold.<br /><br />Most of us can benefit from collaborating with colleagues, but some\\nprojects require people on a larger scale, and starting one of those\\nis not for everyone. If you want to run a project like that, you\\\'ll\\nhave to become a manager, and managing well takes aptitude and\\ninterest like any other kind of work. If you don\\\'t have them, there\\nis no middle path: you must either force yourself to learn management\\nas a second language, or avoid such projects.\\n<font color=#dddddd>[<a href="#f27n"><font color=#dddddd>27</font></a>]</font><br /><br /><br /><br /><br /><br />\\nHusband your morale. It\\\'s the basis of everything when you\\\'re working\\non ambitious projects. You have to nurture and protect it like a\\nliving organism.<br /><br />Morale starts with your view of life. You\\\'re more likely to do great\\nwork if you\\\'re an optimist, and more likely to if you think of\\nyourself as lucky than if you think of yourself as a victim.<br /><br />Indeed, work can to some extent protect you from your problems. If\\nyou choose work that\\\'s pure, its very difficulties will serve as a\\nrefuge from the difficulties of everyday life. If this is escapism,\\nit\\\'s a very productive form of it, and one that has been used by\\nsome of the greatest minds in history.<br /><br />Morale compounds via work: high morale helps you do good work, which\\nincreases your morale and helps you do even'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1973485818164222, text=' your\\nability and interest can take you. And you can only answer that by\\ntrying.<br /><br />Many more people could try to do great work than do. What holds\\nthem back is a combination of modesty and fear. It seems presumptuous\\nto try to be Newton or Shakespeare. It also seems hard; surely if\\nyou tried something like that, you\\\'d fail. Presumably the calculation\\nis rarely explicit. Few people consciously decide not to try to do\\ngreat work. But that\\\'s what\\\'s going on subconsciously; they shy\\naway from the question.<br /><br />So I\\\'m going to pull a sneaky trick on you. Do you want to do great\\nwork, or not? Now you have to decide consciously. Sorry about that.\\nI wouldn\\\'t have done it to a general audience. But we already know\\nyou\\\'re interested.<br /><br />Don\\\'t worry about being presumptuous. You don\\\'t have to tell anyone.\\nAnd if it\\\'s too hard and you fail, so what? Lots of people have\\nworse problems than that. In fact you\\\'ll be lucky if it\\\'s the worst\\nproblem you have.<br /><br />Yes, you\\\'ll have to work hard. But again, lots of people have to\\nwork hard. And if you\\\'re working on something you find very\\ninteresting, which you necessarily will if you\\\'re on the right path,\\nthe work will probably feel less burdensome than a lot of your\\npeers\\\'.<br /><br />The discoveries are out there, waiting to be made. Why not by you?<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />\\n<b>Notes</b><br /><br />[<a name="f1n"><font color=#000000>1</font></a>]\\nI don\\\'t think you could give a precise definition of what\\ncounts as great work. Doing great work means doing something important\\nso well that you expand people\\\'s ideas of what\\\'s possible. But\\nthere\\\'s no threshold for importance. It\\\'s a matter of degree, and\\noften hard to judge at the time anyway. So I\\\'d rather people focused\\non developing their interests rather than worrying about whether\\nthey\\\'re important or not. Just try to do something amazing, and\\nleave it to future generations to say if you succeeded.<br /><br />[<a name="f2n"><font color=#000000>2</font></a>]\\nA lot of standup comedy is based on noticing anomalies in\\neveryday life. "Did you ever notice...?" New ideas come from doing\\nthis about nontrivial things. Which may help explain why people\\\'s\\nreaction to a new idea is often the first half of laughing: Ha!<br /><br />[<a name="f3n"><font color=#000000>3</font></a>]\\nThat second qualifier is critical. If you\\\'re excited about\\nsomething most authorities discount, but you can\\\'t give a more\\nprecise explanation than "they don\\\'t get it," then you\\\'re starting\\nto drift into the territory of cranks.<br /><br />[<a name="f4n"><font color=#000000>4</font></a>]\\nFinding something to work on is not simply a matter of finding\\na match between the current version of you and a list of known\\nproblems. You\\\'ll often have to coevolve with the problem. That\\\'s\\nwhy it can sometimes be so hard to figure out what to work on. The\\nsearch space is huge. It\\\'s the cartesian product of all possible\\nt'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1764591706535943, text='\\noptimistic, and even though one of the sources of their optimism\\nis ignorance, in this case ignorance can sometimes beat knowledge.<br /><br />Try to finish what you start, though, even if it turns out to be\\nmore work than you expected. Finishing things is not just an exercise\\nin tidiness or self-discipline. In many projects a lot of the best\\nwork happens in what was meant to be the final stage.<br /><br />Another permissible lie is to exaggerate the importance of what\\nyou\\\'re working on, at least in your own mind. If that helps you\\ndiscover something new, it may turn out not to have been a lie after\\nall.\\n<font color=#dddddd>[<a href="#f7n"><font color=#dddddd>7</font></a>]</font><br /><br /><br /><br /><br /><br />\\nSince there are two senses of starting work &mdash; per day and per\\nproject &mdash; there are also two forms of procrastination. Per-project\\nprocrastination is far the more dangerous. You put off starting\\nthat ambitious project from year to year because the time isn\\\'t\\nquite right. When you\\\'re procrastinating in units of years, you can\\nget a lot not done.\\n<font color=#dddddd>[<a href="#f8n"><font color=#dddddd>8</font></a>]</font><br /><br />One reason per-project procrastination is so dangerous is that it\\nusually camouflages itself as work. You\\\'re not just sitting around\\ndoing nothing; you\\\'re working industriously on something else. So\\nper-project procrastination doesn\\\'t set off the alarms that per-day\\nprocrastination does. You\\\'re too busy to notice it.<br /><br />The way to beat it is to stop occasionally and ask yourself: Am I\\nworking on what I most want to work on? When you\\\'re young it\\\'s ok\\nif the answer is sometimes no, but this gets increasingly dangerous\\nas you get older.\\n<font color=#dddddd>[<a href="#f9n"><font color=#dddddd>9</font></a>]</font><br /><br /><br /><br /><br /><br />\\nGreat work usually entails spending what would seem to most people\\nan unreasonable amount of time on a problem. You can\\\'t think of\\nthis time as a cost, or it will seem too high. You have to find the\\nwork sufficiently engaging as it\\\'s happening.<br /><br />There may be some jobs where you have to work diligently for years\\nat things you hate before you get to the good part, but this is not\\nhow great work happens. Great work happens by focusing consistently\\non something you\\\'re genuinely interested in. When you pause to take\\nstock, you\\\'re surprised how far you\\\'ve come.<br /><br />The reason we\\\'re surprised is that we underestimate the cumulative\\neffect of work. Writing a page a day doesn\\\'t sound like much, but\\nif you do it every day you\\\'ll write a book a year. That\\\'s the key:\\nconsistency. People who do great things don\\\'t get a lot done every\\nday. They get something done, rather than nothing.<br /><br />If you do work that compounds, you\\\'ll get exponential growth. Most\\npeople who do this do it unconsciously, but it\\\'s worth stopping to\\nthink about. Learning, for example, is an instance of this phenomenon:\\nthe more you learn about something, the easier it is to learn more.\\nGrowing an audience is another: the more fans you have, the more\\nnew fans they\\\'ll bring you.<br /><br />'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.174069664815369, text='\\ninside.<br /><br /><br /><br /><br /><br />Let\\\'s talk a little more about the complicated business of figuring\\nout what to work on. The main reason it\\\'s hard is that you can\\\'t\\ntell what most kinds of work are like except by doing them. Which\\nmeans the four steps overlap: you may have to work at something for\\nyears before you know how much you like it or how good you are at\\nit. And in the meantime you\\\'re not doing, and thus not learning\\nabout, most other kinds of work. So in the worst case you choose\\nlate based on very incomplete information.\\n<font color=#dddddd>[<a href="#f4n"><font color=#dddddd>4</font></a>]</font><br /><br />The nature of ambition exacerbates this problem. Ambition comes in\\ntwo forms, one that precedes interest in the subject and one that\\ngrows out of it. Most people who do great work have a mix, and the\\nmore you have of the former, the harder it will be to decide what\\nto do.<br /><br />The educational systems in most countries pretend it\\\'s easy. They\\nexpect you to commit to a field long before you could know what\\nit\\\'s really like. And as a result an ambitious person on an optimal\\ntrajectory will often read to the system as an instance of breakage.<br /><br />It would be better if they at least admitted it &mdash; if they admitted\\nthat the system not only can\\\'t do much to help you figure out what\\nto work on, but is designed on the assumption that you\\\'ll somehow\\nmagically guess as a teenager. They don\\\'t tell you, but I will:\\nwhen it comes to figuring out what to work on, you\\\'re on your own.\\nSome people get lucky and do guess correctly, but the rest will\\nfind themselves scrambling diagonally across tracks laid down on\\nthe assumption that everyone does.<br /><br />What should you do if you\\\'re young and ambitious but don\\\'t know\\nwhat to work on? What you should <i>not</i> do is drift along passively,\\nassuming the problem will solve itself. You need to take action.\\nBut there is no systematic procedure you can follow. When you read\\nbiographies of people who\\\'ve done great work, it\\\'s remarkable how\\nmuch luck is involved. They discover what to work on as a result\\nof a chance meeting, or by reading a book they happen to pick up.\\nSo you need to make yourself a big target for luck, and the way to\\ndo that is to be curious. Try lots of things, meet lots of people,\\nread lots of books, ask lots of questions.\\n<font color=#dddddd>[<a href="#f5n"><font color=#dddddd>5</font></a>]</font><br /><br />When in doubt, optimize for interestingness. Fields change as you\\nlearn more about them. What mathematicians do, for example, is very\\ndifferent from what you do in high school math classes. So you need\\nto give different types of work a chance to show you what they\\\'re\\nlike. But a field should become <i>increasingly</i> interesting as you\\nlearn more about it. If it doesn\\\'t, it\\\'s probably not for you.<br /><br />Don\\\'t worry if you find you\\\'re interested in different things than\\nother people. The stranger your tastes in interestingness, the\\nbetter. Strange tastes are often strong ones, and a strong taste\\nfor work means you\\\'ll be productive. And you\\\'re more likely to find\\nnew things if you'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.158095578895721, text='. Don\\\'t copy the manner of\\nan eminent 50 year old professor if you\\\'re 18, for example, or the\\nidiom of a Renaissance poem hundreds of years later.<br /><br />Some of the features of things you admire are flaws they succeeded\\ndespite. Indeed, the features that are easiest to imitate are the\\nmost likely to be the flaws.<br /><br />This is particularly true for behavior. Some talented people are\\njerks, and this sometimes makes it seem to the inexperienced that\\nbeing a jerk is part of being talented. It isn\\\'t; being talented\\nis merely how they get away with it.<br /><br />One of the most powerful kinds of copying is to copy something from\\none field into another. History is so full of chance discoveries\\nof this type that it\\\'s probably worth giving chance a hand by\\ndeliberately learning about other kinds of work. You can take ideas\\nfrom quite distant fields if you let them be metaphors.<br /><br />Negative examples can be as inspiring as positive ones. In fact you\\ncan sometimes learn more from things done badly than from things\\ndone well; sometimes it only becomes clear what\\\'s needed when it\\\'s\\nmissing.<br /><br /><br /><br /><br /><br />\\nIf a lot of the best people in your field are collected in one\\nplace, it\\\'s usually a good idea to visit for a while. It will\\nincrease your ambition, and also, by showing you that these people\\nare human, increase your self-confidence.\\n<font color=#dddddd>[<a href="#f26n"><font color=#dddddd>26</font></a>]</font><br /><br />If you\\\'re earnest you\\\'ll probably get a warmer welcome than you\\nmight expect. Most people who are very good at something are happy\\nto talk about it with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1566747762241967, text=',\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if you do that you\\\'ll find you get diminishing returns:\\nfatigue will make you stupid, and eventually even damage your health.\\nThe point at which work yields diminishing returns depends on the\\ntype. Some of the hardest types you might only be able to do for\\nfour or five hours a day.<br /><br />Ideally those hours will be contiguous. To the extent you can, try\\nto arrange your life so you have big blocks of time to work in.\\nYou\\\'ll shy away from hard tasks if you know you might be interrupted.<br /><br />It will probably be harder to start working than to keep working.\\nYou\\\'ll often have to trick yourself to get over that initial\\nthreshold. Don\\\'t worry about this; it\\\'s the nature of work, not a\\nflaw in your character. Work has a sort of activation energy, both\\nper day and per project. And since this threshold is fake in the\\nsense that it\\\'s higher than the energy required to keep going, it\\\'s\\nok to tell yourself a lie of corresponding magnitude to get over\\nit.<br /><br />It\\\'s usually a mistake to lie to yourself if you want to do great\\nwork, but this is one of the rare cases where it isn\\\'t. When I\\\'m\\nreluctant to start work in the morning, I often trick myself by\\nsaying "I\\\'ll just read over what I\\\'ve got so far." Five minutes\\nlater I\\\'ve found something that seems mistaken or incomplete, and\\nI\\\'m off.<br /><br />Similar techniques work for starting new projects. It\\\'s ok to lie\\nto yourself about how much work a project will entail, for example.\\nLots of great things began with someone saying "How hard could it\\nbe?"<br /><br />This is one case where the young have an advantage. They\\\'re more'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1349744395573516, text=' audience\\nin the traditional sense. Either way it doesn\\\'t need to be big.\\nThe value of an audience doesn\\\'t grow anything like linearly with\\nits size. Which is bad news if you\\\'re famous, but good news if\\nyou\\\'re just starting out, because it means a small but dedicated\\naudience can be enough to sustain you. If a handful of people\\ngenuinely love what you\\\'re doing, that\\\'s enough.<br /><br />To the extent you can, avoid letting intermediaries come between\\nyou and your audience. In some types of work this is inevitable,\\nbut it\\\'s so liberating to escape it that you might be better off\\nswitching to an adjacent type if that will let you go direct.\\n<font color=#dddddd>[<a href="#f28n"><font color=#dddddd>28</font></a>]</font><br /><br />The people you spend time with will also have a big effect on your\\nmorale. You\\\'ll find there are some who increase your energy and\\nothers who decrease it, and the effect someone has is not always\\nwhat you\\\'d expect. Seek out the people who increase your energy and\\navoid those who decrease it. Though of course if there\\\'s someone\\nyou need to take care of, that takes precedence.<br /><br />Don\\\'t marry someone who doesn\\\'t understand that you need to work,\\nor sees your work as competition for your attention. If you\\\'re\\nambitious, you need to work; it\\\'s almost like a medical condition;\\nso someone who won\\\'t let you work either doesn\\\'t understand you,\\nor does and doesn\\\'t care.<br /><br />Ultimately morale is physical. You think with your body, so it\\\'s\\nimportant to take care of it. That means exercising regularly,\\neating and sleeping well, and avoiding the more dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.123214818076958, text='b\'<html><head><meta name="Keywords" content="" /><title>How to Do Great Work</title><!-- <META NAME="ROBOTS" CONTENT="NOODP"> -->\\n<link rel="shortcut icon" href="http://ycombinator.com/arc/arc.png">\\n</head><body bgcolor="#ffffff" background="https://s.turbifycdn.com/aah/paulgraham/bel-6.gif" text="#000000" link="#000099" vlink="#464646"><table border="0" cellspacing="0" cellpadding="0"><tr valign="top"><td><map name=118ab66adb24b4f><area shape=rect coords="0,0,67,21" href="index.html"><area shape=rect coords="0,21,67,42" href="articles.html"><area shape=rect coords="0,42,67,63" href="http://www.amazon.com/gp/product/0596006624"><area shape=rect coords="0,63,67,84" href="books.html"><area shape=rect coords="0,84,67,105" href="http://ycombinator.com"><area shape=rect coords="0,105,67,126" href="arc.html"><area shape=rect coords="0,126,67,147" href="bel.html"><area shape=rect coords="0,147,67,168" href="lisp.html"><area shape=rect coords="0,168,67,189" href="antispam.html"><area shape=rect coords="0,189,67,210" href="kedrosky.html"><area shape=rect coords="0,210,67,231" href="faq.html"><area shape=rect coords="0,231,67,252" href="raq.html"><area shape=rect coords="0,252,67,273" href="quo.html"><area shape=rect coords="0,273,67,294" href="rss.html"><area shape=rect coords="0,294,67,315" href="bio.html"><area shape=rect coords="0,315,67,336" href="https://twitter.com/paulg"><area shape=rect coords="0,336,67,357" href="https://mas.to/@paulg"></map><img src="https://s.turbifycdn.com/aah/paulgraham/bel-7.gif" width="69" height="357" usemap=#118ab66adb24b4f border="0" hspace="0" vspace="0" ismap /></td><td><img src="https://sep.turbifycdn.com/ca/Img/trans_1x1.gif" height="1" width="26" border="0" /></td><td><a href="index.html"><img src="https://s.turbifycdn.com/aah/paulgraham/bel-8.gif" width="410" height="45" border="0" hspace="0" vspace="0" /></a><br /><br /><table border="0" cellspacing="0" cellpadding="0" width="435"><tr valign="top"><td width="435"><img src="https://s.turbifycdn.com/aah/paulgraham/how-to-do-great-work-2.gif" width="185" height="18" border="0" hspace="0" vspace="0" alt="How to Do Great Work" /><br /><br /><font size="2" face="verdana">July 2023<br /><br />If you collected lists of techniques for doing great work in a lot\\nof different fields, what would the intersection look like? I decided\\nto find out'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1193194369249235, text=' dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied with a single\\nword, my bet would be on "curiosity."<br /><br />That doesn\\\'t translate directly to advice. It\\\'s not enough just to\\nbe curious, and you can\\\'t command curiosity anyway. But you can\\nnurture it and let it drive you.<br /><br />Curiosity is the key to all four steps in doing great work: it will\\nchoose the field for you, get you to the frontier, cause you to\\nnotice the gaps in it, and drive you to explore them. The whole\\nprocess is a kind of dance with curiosity.<br /><br /><br /><br /><br /><br />\\nBelieve it or not, I tried to make this essay as short as I could.\\nBut its length at least means it acts as a filter. If you made it\\nthis far, you must be interested in doing great work. And if so\\nyou\\\'re already further along than you might realize, because the\\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\\nmathematical sense, and they are: ability, interest, effort, and\\nluck. Luck by definition you can\\\'t do anything about, so we can\\nignore that. And we can assume effort, if you do in fact want to\\ndo great work. So the problem boils down to ability and interest.\\nCan you find a kind of work where your ability and interest will\\ncombine to yield an explosion of new ideas?<br /><br />Here there are grounds for optimism. There are so many different\\nways to do great work, and even more that are still undiscovered.\\nOut of all those different types of work, the one you\\\'re most suited\\nfor is probably a pretty close match. Probably a comically close\\nmatch. It\\\'s just a question of finding it, and how far into it')]), ResponseOutputMessage(id='msg_3591ea71-8b35-4efd-a5ad-c1c250801971', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')], text='To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=False, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity=None), top_logprobs=None, truncation=None, usage=None, user=None)

In [34]: resp.output[1].content[0].text
Out[34]: 'To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.'
```
</details>

The relevant output looks like this:

```python
>resp.output[1].content[0].annotations
[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'),
 AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')]```
And
```python
In [144]: print(resp.output[1].content[0].text)
To do great work, consider the following principles:

1. **Follow Your Interests**: Engage in work that genuinely excites you.
If you find an area intriguing, pursue it without being overly concerned
about external pressures or norms. You should create things that you
would want for yourself, as this often aligns with what others in your
circle might want too.

2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should
be tempered by genuine interest. Instead of detailed planning for the
future, focus on exciting projects that keep your options open. This
approach, known as "staying upwind," allows for adaptability and can
lead to unforeseen achievements.

3. **Choose Quality Colleagues**: Collaborating with talented colleagues
can significantly affect your own work. Seek out individuals who offer
surprising insights and whom you admire. The presence of good colleagues
can elevate the quality of your work and inspire you.

4. **Maintain High Morale**: Your attitude towards work and life affects
your performance. Cultivating optimism and viewing yourself as lucky
rather than victimized can boost your productivity. It’s essential to
care for your physical health as well since it directly impacts your
mental faculties and morale.

5. **Be Consistent**: Great work often comes from cumulative effort.
Daily progress, even in small amounts, can result in substantial
achievements over time. Emphasize consistency and make the work
engaging, as this reduces the perceived burden of hard labor.

6. **Embrace Curiosity**: Curiosity is a driving force that can guide
you in selecting fields of interest, pushing you to explore uncharted
territories. Allow it to shape your work and continually seek knowledge
and insights.

By focusing on these aspects, you can create an environment conducive to
great work and personal fulfillment.
```

And the code below outputs only periods highlighting that the position/index behaves as expected—i.e., the annotation happens at the end of the sentence.

```python
print([resp.output[1].content[0].text[j.index] for j in
resp.output[1].content[0].annotations])
Out[41]: ['.', '.', '.', '.', '.', '.']
```

## Test Plan
Unit tests added.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-07 14:00:56 -04:00
Charlie Doern
6389bf5ffb
fix: make telemetry optional for agents (#3705)
# What does this PR do?

there is a lot of code in the agents API using the telemetry API and its
helpers without checking if that API is even enabled.

This is the only API besides inference actively using telemetry code, so
after this telemetry can be optional for the entire stack


resolves #3665


## Test Plan

existing agent tests.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-07 16:09:03 +02:00
Matthew Farrellee
e892a3f7f4
feat: add refresh_models support to inference adapters (default: false) (#3719)
# What does this PR do?

inference adapters can now configure `refresh_models: bool` to control
periodic model listing from their providers

BREAKING CHANGE: together inference adapter default changed. previously
always refreshed, now follows config.

addresses "models: refresh" on #3517

## Test Plan

ci w/ new tests
2025-10-07 15:19:56 +02:00
Charlie Doern
8b9af03a1b
fix: refresh log should be debug (#3720)
# What does this PR do?

when using a distro like starter where a bunch of providers are disabled
I should not see logs like:

```
         in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:38:52,117 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,123 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider fireworks: Pass Fireworks API Key in the header
         X-LlamaStack-Provider-Data as { "fireworks_api_key": <your api key>}
WARNING  2025-10-07 08:43:52,126 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider together: Pass Together API Key in the header
         X-LlamaStack-Provider-Data as { "together_api_key": <your api key>}
WARNING  2025-10-07 08:43:52,129 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider openai: API key is not set. Please provide a valid API
         key in the provider data header, e.g. x-llamastack-provider-data: {"openai_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,132 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider anthropic: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"anthropic_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,136 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider gemini: API key is not set. Please provide a valid API
         key in the provider data header, e.g. x-llamastack-provider-data: {"gemini_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,139 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider groq: API key is not set. Please provide a valid API key
         in the provider data header, e.g. x-llamastack-provider-data: {"groq_api_key": "<API_KEY>"}, or in the provider config.
WARNING  2025-10-07 08:43:52,142 llama_stack.core.routing_tables.models:36 core::routing_tables: Model refresh failed for provider sambanova: API key is not set. Please provide a valid
         API key in the provider data header, e.g. x-llamastack-provider-data: {"sambanova_api_key": "<API_KEY>"}, or in the provider config.
^CINFO     2025-10-07 08:46:11,996 llama_stack.core.utils.exec:75 core:
```

as WARNING. Switch to Debug.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-07 09:04:07 -04:00
Sumanth Kamenani
1fcde5fc2f
fix: update pyproject.toml dependencies for vector processing (#3555)
What does this PR do?

Updates pyproject.toml dependencies to fix vector processing
compatibility issues.

closes: #3495 

  Test Plan

  Tested llama stack server with faiss vector database:

1. Built and ran server: llama stack build --distro starter --image-type
venv --image-name llamastack-faiss
3. Tested file upload: Successfully uploaded PDF via /v1/openai/v1/files
  4. Tested vector operations:
    - Created vector store with faiss backend
    - Added PDF to vector store
    - Performed semantic search queries
2025-10-07 15:01:36 +02:00
Justin
509ac4a659
feat: enable Runpod inference adapter (#3707)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 30s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
Sorry to @mattf I thought I could close the other PR and reopen it.. But
I didn't have the option to reopen it now. I just didn't want it to keep
notifying maintainers if I would make other commits for testing.

Continuation of: https://github.com/llamastack/llama-stack/pull/3641

PR fixes Runpod Adapter
https://github.com/llamastack/llama-stack/issues/3517

## What I fixed from before:
Continuation of: https://github.com/llamastack/llama-stack/pull/3641

1. Made it all OpenAI
2. Fixed the class up since the OpenAIMixin had a couple changes with
the pydantic base model stuff.
3. Test to make sure that we could dynamically find models and use the
resulting identifier to make requests
```bash
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"
```

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

```
# RunPod Provider Quick Start

## Prerequisites
- Python 3.10+
- Git
- RunPod API token

## Setup for Development

```bash
# 1. Clone and enter the repository
cd (into the repo)

# 2. Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Remove any existing llama-stack installation
pip uninstall llama-stack llama-stack-client -y

# 4. Install llama-stack in development mode
pip install -e .

# 5. Build using local development code
(Found this through the Discord)
LLAMA_STACK_DIR=. llama stack build

# When prompted during build:
# - Name: runpod-dev
# - Image type: venv
# - Inference provider: remote::runpod
# - Safety provider: "llama-guard" 
# - Other providers: first defaults
```

## Configure the Stack

The RunPod adapter automatically discovers models from your endpoint via the `/v1/models` API.
No manual model configuration is required - just set your environment variables.

## Run the Server

### Important: Use the Build-Created Virtual Environment

```bash
# Exit the development venv if you're in it
deactivate

# Activate the build-created venv (NOT .venv)
cd (lama-stack folder github repo)
source llamastack-runpod-dev/bin/activate
```

### For Qwen3-32B-AWQ Public Endpoint (Recommended)

```bash
# Set environment variables
export RUNPOD_URL="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1"
export RUNPOD_API_TOKEN="your_runpod_api_key"

# Start server
llama stack run
~/.llama/distributions/llamastack-runpod-dev/llamastack-runpod-dev-run.yaml
```

## Quick Test

### 1. List Available Models (Dynamic Discovery)

First, check which models are available on your RunPod endpoint:

```bash
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"
```

**Example Response:**
```json
{
  "data": [
    {
      "identifier": "qwen3-32b-awq",
      "provider_resource_id": "Qwen/Qwen3-32B-AWQ",
      "provider_id": "runpod",
      "type": "model",
      "metadata": {},
      "model_type": "llm"
    }
  ]
}
```

**Note:** Use the `identifier` value from the response above in your requests below.

### 2. Chat Completion (Non-streaming)

Replace `qwen3-32b-awq` with your model identifier from step 1:

```bash
curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Hello, count to 3"}],
    "stream": false
  }'
```

### 3. Chat Completion (Streaming)

```bash
curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'
```

**Clean streaming output:**
```bash
curl -N -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
-d '{"model": "qwen3-32b-awq", "messages": [{"role": "user", "content":
"Count to 5"}], "stream": true}' \
  2>/dev/null | while read -r line; do
echo "$line" | grep "^data: " | sed 's/^data: //' | jq -r
'.choices[0].delta.content // empty' 2>/dev/null
  done
```

**Expected Output:**
```
1
2
3
4
5
```
2025-10-07 12:24:50 +02:00
ehhuang
50f9ca3541
chore: remove dead code (#3713)
# What does this PR do?


## Test Plan
2025-10-07 12:13:11 +02:00
slekkala1
bba9957edd
feat(api): Add vector store file batches api (#3642)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Add Open AI Compatible vector store file batches api. This functionality
is needed to attach many files to a vector store as a batch.
https://github.com/llamastack/llama-stack/issues/3533

API Stubs have been merged
https://github.com/llamastack/llama-stack/pull/3615
Adds persistence for file batches as discussed in diff
https://github.com/llamastack/llama-stack/pull/3544
(Used claude code for generation and reviewed by me)


## Test Plan
1. Unit tests pass
2. Also verified the cc-vec integration with LLamaStackClient works with
the file batches api. https://github.com/raghotham/cc-vec
2. Integration tests pass
2025-10-06 16:58:22 -07:00
ehhuang
597d405e13
chore: fix closing error (#3709)
# What does this PR do?
Gets rid of this error message below (disclaimer: not sure why, but it
does).

ERROR 2025-10-06 12:04:22,837 asyncio:118 uncategorized: Task exception
was never retrieved
future: <Task finished name='Task-36' coro=<AsyncClient.aclose() done,
defined at

/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_client.py:1978>
exception=RuntimeError('unable to perform operation on <TCPTransport
closed=True reading=False 0x122dc7ad0>; the handler is closed')>
╭───────────────────────────────────────────────────────────────────
Traceback (most recent call last)
───────────────────────────────────────────────────────────────────╮
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_client.py:1985
in aclose │
│ │
│ 1982 │ │ if self._state != ClientState.CLOSED: │
│ 1983 │ │ │ self._state = ClientState.CLOSED │
│ 1984 │ │ │ │
│ ❱ 1985 │ │ │ await self._transport.aclose() │
│ 1986 │ │ │ for proxy in self._mounts.values(): │
│ 1987 │ │ │ │ if proxy is not None: │
│ 1988 │ │ │ │ │ await proxy.aclose() │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpx/_transports/default.py:406
in aclose │
│ │
│ 403 │ │ ) │
│ 404 │ │
│ 405 │ async def aclose(self) -> None: │
│ ❱ 406 │ │ await self._pool.aclose() │
│ 407 │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py:353
in aclose │
│ │
│ 350 │ │ with self._optional_thread_lock: │
│ 351 │ │ │ closing_connections = list(self._connections) │
│ 352 │ │ │ self._connections = [] │
│ ❱ 353 │ │ await self._close_connections(closing_connections) │
│ 354 │ │
│ 355 │ async def __aenter__(self) -> AsyncConnectionPool: │
│ 356 │ │ return self │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection_pool.py:345
in _close_connections │
│ │
│ 342 │ │ # Close connections which have been removed from the pool. │
│ 343 │ │ with AsyncShieldCancellation(): │
│ 344 │ │ │ for connection in closing: │
│ ❱ 345 │ │ │ │ await connection.aclose() │
│ 346 │ │
│ 347 │ async def aclose(self) -> None: │
│ 348 │ │ # Explicitly close the connection pool. │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/connection.py:173
in aclose │
│ │
│ 170 │ async def aclose(self) -> None: │
│ 171 │ │ if self._connection is not None: │
│ 172 │ │ │ async with Trace("close", logger, None, {}): │
│ ❱ 173 │ │ │ │ await self._connection.aclose() │
│ 174 │ │
│ 175 │ def is_available(self) -> bool: │
│ 176 │ │ if self._connection is None: │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_async/http11.py:258
in aclose │
│ │
│ 255 │ │ # Note that this method unilaterally closes the connection,
and does │
│ 256 │ │ # not have any kind of locking in place around it. │
│ 257 │ │ self._state = HTTPConnectionState.CLOSED │
│ ❱ 258 │ │ await self._network_stream.aclose() │
│ 259 │ │
│ 260 │ # The AsyncConnectionInterface methods provide information about
the state of │
│ 261 │ # the connection, allowing for a connection pooling
implementation to │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/httpcore/_backends/anyio.py:53
in aclose │
│ │
│ 50 │ │ │ │ await self._stream.send(item=buffer) │
│ 51 │ │
│ 52 │ async def aclose(self) -> None: │
│ ❱ 53 │ │ await self._stream.aclose() │
│ 54 │ │
│ 55 │ async def start_tls( │
│ 56 │ │ self, │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/anyio/streams/tls.py:216
in aclose │
│ │
│ 213 │ │ │ │ await aclose_forcefully(self.transport_stream) │
│ 214 │ │ │ │ raise │
│ 215 │ │ │
│ ❱ 216 │ │ await self.transport_stream.aclose() │
│ 217 │ │
│ 218 │ async def receive(self, max_bytes: int = 65536) -> bytes: │
│ 219 │ │ data = await
self._call_sslobject_method(self._ssl_object.read, max_bytes) │
│ │
│
/Users/erichuang/projects/llama-stack-git2/.venv/lib/python3.12/site-packages/anyio/_backends/_asyncio.py:1310
in aclose │
│ │
│ 1307 │ │ if not self._transport.is_closing(): │
│ 1308 │ │ │ self._closed = True │
│ 1309 │ │ │ try: │
│ ❱ 1310 │ │ │ │ self._transport.write_eof() │
│ 1311 │ │ │ except OSError: │
│ 1312 │ │ │ │ pass │
│ 1313 │
│ │
│ in uvloop.loop.UVStream.write_eof:703 │
│ │
│ in uvloop.loop.UVHandle._ensure_alive:159 │

╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: unable to perform operation on <TCPTransport closed=True
reading=False 0x122dc7ad0>; the handler is closed

## Test Plan
Run
uv run --with llama-stack llama stack build --distro=starter
--image-type=venv --run

No more error
2025-10-06 14:44:01 -07:00
ehhuang
696fefbf17
chore: logger category fix (#3706)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Test Llama Stack Build / build (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m22s
# What does this PR do?
WARNING 2025-10-06 12:01:45,137 root:266 uncategorized: Unknown logging
category: tokenizer_utils. Falling back to default 'root' level: 20

## Test Plan
2025-10-06 12:16:26 -07:00
Alexey Rybak
a8da6ba3a7
docs: API docstrings cleanup for better documentation rendering (#3661)
# What does this PR do?
* Cleans up API docstrings for better documentation rendering

<img width="2346" height="1126" alt="image"
src="https://github.com/user-attachments/assets/516b09a1-2d5b-4614-a3a9-13431fc21fc1"
/>

## Test Plan
* Manual testing

---------

Signed-off-by: Doug Edgar <dedgar@redhat.com>
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>
Co-authored-by: Doug Edgar <dedgar@redhat.com>
Co-authored-by: Christian Zaccaria <73656840+ChristianZaccaria@users.noreply.github.com>
Co-authored-by: Anastas Stoyanovsky <contact@anastas.eu>
Co-authored-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Young Han <110819238+seyeong-han@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 10:46:33 -07:00
Matthew Farrellee
892ea759fa
chore: remove together inference adapter's custom check_model_availability (#3702)
# What does this PR do?

remove Together inference adapter's check_model_availability impl, rely
on standard impl instead


## Test Plan

ci
2025-10-06 13:28:36 -04:00
Matthew Farrellee
de9940c697
chore: disable openai_embeddings on inference=remote::llama-openai-compat (#3704)
# What does this PR do?

api.llama.com does not provide embedding models, this makes that clear


## Test Plan

ci
2025-10-06 13:27:40 -04:00
Matthew Farrellee
ae74b31ae3
chore: remove vLLM inference adapter's custom list_models (#3703)
# What does this PR do?

remove vLLM inference adapter's custom list_models impl, rely on
standard impl instead

## Test Plan

ci
2025-10-06 13:27:30 -04:00
Matthew Farrellee
d23ed26238
chore: turn OpenAIMixin into a pydantic.BaseModel (#3671)
# What does this PR do?

- implement get_api_key instead of relying on
LiteLLMOpenAIMixin.get_api_key
 - remove use of LiteLLMOpenAIMixin
 - add default initialize/shutdown methods to OpenAIMixin
 - remove __init__s to allow proper pydantic construction
- remove dead code from vllm adapter and associated / duplicate unit
tests
 - update vllm adapter to use openaimixin for model registration
 - remove ModelRegistryHelper from fireworks & together adapters
 - remove Inference from nvidia adapter
 - complete type hints on embedding_model_metadata
- allow extra fields on OpenAIMixin, for model_store, __provider_id__,
etc
 - new recordings for ollama
 - enhance the list models error handling
- update cerebras (remove cerebras-cloud-sdk) and anthropic (custom
model listing) inference adapters
 - parametrized test_inference_client_caching
- remove cerebras, databricks, fireworks, together from blanket mypy
exclude
 - removed unnecessary litellm deps

## Test Plan

ci
2025-10-06 11:33:19 -04:00
Matthew Farrellee
724dac498c
chore: give OpenAIMixin subcalsses a change to list models without leaking _model_cache details (#3682)
# What does this PR do?

close the _model_cache abstraction leak

## Test Plan

ci w/ new tests
2025-10-06 09:44:33 -04:00
Charlie Doern
f00bcd9561
feat: allow for multiple external provider specs (#3341)
# What does this PR do?

when using the providers.d method of installation users could hand craft
their AdapterSpec's to use overlapping code meaning one repo could
contain an inline and remote impl. Currently installing a provider via
module does not allow for that as each repo is only allowed to have one
`get_provider_spec` method with one Spec returned

add an optional way for `get_provider_spec` to return a list of
`ProviderSpec` where each can be either an inline or remote impl.

Note: the `adapter_type` in `get_provider_spec` MUST match the
`provider_type` in the build/run yaml for this to work.

resolves #3226

## Test Plan

once this merges we need to re-enable the external provider test and
account for this functionality. Work needs to be done in the external
provider repos to support this functionality.

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-06 15:26:38 +02:00
ehhuang
426cac078b
chore: use uvicorn to start llama stack server everywhere (#3625)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do?
https://github.com/llamastack/llama-stack/pull/3462 allows using uvicorn
to start llama stack server which supports spawning multiple workers.

This PR enables us to launch >1 workers from `llama stack run` (will add
the parameter in a follow-up PR, keeping this PR on simplifying) by
removing the old way of launching stack server and consolidates
launching via uvicorn.run only.


## Test Plan
ran `llama stack run starter`
CI
2025-10-06 14:27:40 +02:00
dependabot[bot]
c0f0a03529
chore(ui-deps): bump react-dom and @types/react-dom in /llama_stack/ui (#3693)
Bumps
[react-dom](https://github.com/facebook/react/tree/HEAD/packages/react-dom)
and
[@types/react-dom](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-dom).
These dependencies needed to be updated together.
Updates `react-dom` from 19.1.1 to 19.2.0
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/releases">react-dom's
releases</a>.</em></p>
<blockquote>
<h2>19.2.0 (Oct 1, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h2>New React Features</h2>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h2>New React DOM Features</h2>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h2>Notable changes</h2>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h2>All Changes</h2>
<h3>React</h3>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h3>React DOM</h3>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
<li>Warn for using a React owned node as a Container if it also has text
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32774">#32774</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/facebook/react/blob/main/CHANGELOG.md">react-dom's
changelog</a>.</em></p>
<blockquote>
<h2>19.2.0 (October 1st, 2025)</h2>
<p>Below is a list of all new features, APIs, and bug fixes.</p>
<p>Read the <a href="https://react.dev/blog/2025/10/01/react-19-2">React
19.2 release post</a> for more information.</p>
<h3>New React Features</h3>
<ul>
<li><a
href="https://react.dev/reference/react/Activity"><code>&lt;Activity&gt;</code></a>:
A new API to hide and restore the UI and internal state of its
children.</li>
<li><a
href="https://react.dev/reference/react/useEffectEvent"><code>useEffectEvent</code></a>
is a React Hook that lets you extract non-reactive logic into an <a
href="https://react.dev/learn/separating-events-from-effects#declaring-an-effect-event">Effect
Event</a>.</li>
<li><a
href="https://react.dev/reference/react/cacheSignal"><code>cacheSignal</code></a>
(for RSCs) lets your know when the <code>cache()</code> lifetime is
over.</li>
<li><a
href="https://react.dev/reference/developer-tooling/react-performance-tracks">React
Performance tracks</a> appear on the Performance panel’s timeline in
your browser developer tools</li>
</ul>
<h3>New React DOM Features</h3>
<ul>
<li>Added resume APIs for partial pre-rendering with Web Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resume"><code>resume</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerender"><code>resumeAndPrerender</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Added resume APIs for partial pre-rendering with Node Streams:
<ul>
<li><a
href="https://react.dev/reference/react-dom/server/resumeToPipeableStream"><code>resumeToPipeableStream</code></a>:
to resume a prerender to a stream.</li>
<li><a
href="https://react.dev/reference/react-dom/static/resumeAndPrerenderToNodeStream"><code>resumeAndPrerenderToNodeStream</code></a>:
to resume a prerender to HTML.</li>
</ul>
</li>
<li>Updated <a
href="https://react.dev/reference/react-dom/static/prerender"><code>prerender</code></a>
APIs to return a <code>postponed</code> state that can be passed to the
<code>resume</code> APIs.</li>
</ul>
<h3>Notable changes</h3>
<ul>
<li>React DOM now batches suspense boundary reveals, matching the
behavior of client side rendering. This change is especially noticeable
when animating the reveal of Suspense boundaries e.g. with the upcoming
<code>&lt;ViewTransition&gt;</code> Component. React will batch as much
reveals as possible before the first paint while trying to hit popular
first-contentful paint metrics.</li>
<li>Add Node Web Streams (<code>prerender</code>,
<code>renderToReadableStream</code>) to server-side-rendering APIs for
Node.js</li>
<li>Use underscore instead of <code>:</code> IDs generated by useId</li>
</ul>
<h3>All Changes</h3>
<h4>React</h4>
<ul>
<li><code>&lt;Activity /&gt;</code> was developed over many years,
starting before <code>ClassComponent.setState</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> and
many others)</li>
<li>Stringify context as &quot;SomeContext&quot; instead of
&quot;SomeContext.Provider&quot; (<a
href="https://github.com/kassens"><code>@​kassens</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33507">#33507</a>)</li>
<li>Include stack of cause of React instrumentation errors with
<code>%o</code> placeholder (<a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34198">#34198</a>)</li>
<li>Fix infinite <code>useDeferredValue</code> loop in popstate event
(<a href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32821">#32821</a>)</li>
<li>Fix a bug when an initial value was passed to
<code>useDeferredValue</code> (<a
href="https://github.com/acdlite"><code>@​acdlite</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34376">#34376</a>)</li>
<li>Fix a crash when submitting forms with Client Actions (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33055">#33055</a>)</li>
<li>Hide/unhide the content of dehydrated suspense boundaries if they
resuspend (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32900">#32900</a>)</li>
<li>Avoid stack overflow on wide trees during Hot Reload (<a
href="https://github.com/sophiebits"><code>@​sophiebits</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34145">#34145</a>)</li>
<li>Improve Owner and Component stacks in various places (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/33629">#33629</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33724">#33724</a>,
<a
href="https://redirect.github.com/facebook/react/pull/32735">#32735</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33723">#33723</a>)</li>
<li>Add <code>cacheSignal</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33557">#33557</a>)</li>
</ul>
<h4>React DOM</h4>
<ul>
<li>Block on Suspensey Fonts during reveal of server-side-rendered
content (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a> <a
href="https://redirect.github.com/facebook/react/pull/33342">#33342</a>)</li>
<li>Use underscore instead of <code>:</code> for IDs generated by
<code>useId</code> (<a
href="https://github.com/sebmarkbage"><code>@​sebmarkbage</code></a>, <a
href="https://github.com/eps1lon"><code>@​eps1lon</code></a>: <a
href="https://redirect.github.com/facebook/react/pull/32001">#32001</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33342">facebook/react#33342</a><a
href="https://redirect.github.com/facebook/react/pull/33099">#33099</a>,
<a
href="https://redirect.github.com/facebook/react/pull/33422">#33422</a>)</li>
<li>Stop warning when ARIA 1.3 attributes are used (<a
href="https://github.com/Abdul-Omira"><code>@​Abdul-Omira</code></a> <a
href="https://redirect.github.com/facebook/react/pull/34264">#34264</a>)</li>
<li>Allow <code>nonce</code> to be used on hoistable styles (<a
href="https://github.com/Andarist"><code>@​Andarist</code></a> <a
href="https://redirect.github.com/facebook/react/pull/32461">#32461</a>)</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="861811347b"><code>8618113</code></a>
Bump scheduler version (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34671">#34671</a>)</li>
<li><a
href="1bd1f01f2a"><code>1bd1f01</code></a>
Ship partial-prerendering APIs to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34633">#34633</a>)</li>
<li><a
href="2f0649a0b2"><code>2f0649a</code></a>
[Fizz] Remove <code>nonce</code> option from resume-and-prerender APIs
(<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34664">#34664</a>)</li>
<li><a
href="5667a41fe4"><code>5667a41</code></a>
Bump next prerelease version numbers (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34639">#34639</a>)</li>
<li><a
href="e08f53b182"><code>e08f53b</code></a>
Match <code>react-dom/static</code> test entrypoints and published
entrypoints (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34599">#34599</a>)</li>
<li><a
href="8bb7241f4c"><code>8bb7241</code></a>
Bump useEffectEvent to Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34610">#34610</a>)</li>
<li><a
href="83c88ad470"><code>83c88ad</code></a>
Handle fabric root level fragment with compareDocumentPosition (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34533">#34533</a>)</li>
<li><a
href="68f00c901c"><code>68f00c9</code></a>
Release Activity in Canary (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34374">#34374</a>)</li>
<li><a
href="3168e08f83"><code>3168e08</code></a>
[flags] enable opt-in for enableDefaultTransitionIndicator (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/34373">#34373</a>)</li>
<li><a
href="3434ff4f4b"><code>3434ff4</code></a>
Add scrollIntoView to fragment instances (<a
href="https://github.com/facebook/react/tree/HEAD/packages/react-dom/issues/32814">#32814</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/facebook/react/commits/v19.2.0/packages/react-dom">compare
view</a></li>
</ul>
</details>
<br />

Updates `@types/react-dom` from 19.1.9 to 19.2.0
<details>
<summary>Commits</summary>
<ul>
<li>See full diff in <a
href="https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-dom">compare
view</a></li>
</ul>
</details>
<br />


Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:02:31 -04:00
dependabot[bot]
91c6a8a3a3
chore(ui-deps): bump next from 15.5.3 to 15.5.4 in /llama_stack/ui (#3694)
Bumps [next](https://github.com/vercel/next.js) from 15.5.3 to 15.5.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vercel/next.js/releases">next's
releases</a>.</em></p>
<blockquote>
<h2>v15.5.4</h2>
<blockquote>
<p>[!NOTE]<br />
This release is backporting bug fixes. It does <strong>not</strong>
include all pending features/changes on canary.</p>
</blockquote>
<h3>Core Changes</h3>
<ul>
<li>fix: ensure onRequestError is invoked when otel enabled (<a
href="https://redirect.github.com/vercel/next.js/issues/83343">#83343</a>)</li>
<li>fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)</li>
<li>[devtool] fix overlay styles are missing (<a
href="https://redirect.github.com/vercel/next.js/issues/83721">#83721</a>)</li>
<li>Turbopack: don't match dynamic pattern for node_modules packages (<a
href="https://redirect.github.com/vercel/next.js/issues/83176">#83176</a>)</li>
<li>Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/82911">#82911</a>)</li>
<li>[turbopack] Improve handling of symlink resolution errors in
track_glob and read_glob (<a
href="https://redirect.github.com/vercel/next.js/issues/83357">#83357</a>)</li>
<li>Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/82939">#82939</a>)</li>
<li>fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)</li>
<li>Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84077">#84077</a>)</li>
</ul>
<h3>Misc Changes</h3>
<ul>
<li>[CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)</li>
<li>CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83745">#83745</a>)</li>
<li>docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
</ul>
<h3>Credits</h3>
<p>Huge thanks to <a
href="https://github.com/yiminghe"><code>@​yiminghe</code></a>, <a
href="https://github.com/huozhi"><code>@​huozhi</code></a>, <a
href="https://github.com/devjiwonchoi"><code>@​devjiwonchoi</code></a>,
<a href="https://github.com/mischnic"><code>@​mischnic</code></a>, <a
href="https://github.com/lukesandberg"><code>@​lukesandberg</code></a>,
<a href="https://github.com/ztanner"><code>@​ztanner</code></a>, <a
href="https://github.com/icyJoseph"><code>@​icyJoseph</code></a>, <a
href="https://github.com/leerob"><code>@​leerob</code></a>, <a
href="https://github.com/fufuShih"><code>@​fufuShih</code></a>, <a
href="https://github.com/dwrth"><code>@​dwrth</code></a>, <a
href="https://github.com/aymericzip"><code>@​aymericzip</code></a>, <a
href="https://github.com/obendev"><code>@​obendev</code></a>, <a
href="https://github.com/molebox"><code>@​molebox</code></a>, <a
href="https://github.com/OoMNoO"><code>@​OoMNoO</code></a>, <a
href="https://github.com/pontasan"><code>@​pontasan</code></a>, <a
href="https://github.com/styfle"><code>@​styfle</code></a>, <a
href="https://github.com/HondaYt"><code>@​HondaYt</code></a>, <a
href="https://github.com/ryuapp"><code>@​ryuapp</code></a>, <a
href="https://github.com/lpalmes"><code>@​lpalmes</code></a>, and <a
href="https://github.com/ijjk"><code>@​ijjk</code></a> for helping!</p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="40f1d7814d"><code>40f1d78</code></a>
v15.5.4</li>
<li><a
href="cb30f0a176"><code>cb30f0a</code></a>
[backport] docs: september improvements and fixes (<a
href="https://redirect.github.com/vercel/next.js/issues/83997">#83997</a>)</li>
<li><a
href="b6a32bb579"><code>b6a32bb</code></a>
[backport] [CNA] use linter preference (<a
href="https://redirect.github.com/vercel/next.js/issues/83194">#83194</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/84087">#84087</a>)</li>
<li><a
href="26d61f1e9a"><code>26d61f1</code></a>
[backport] Turbopack: flush Node.js worker IPC on error (<a
href="https://redirect.github.com/vercel/next.js/issues/84079">#84079</a>)</li>
<li><a
href="e11e87a547"><code>e11e87a</code></a>
[backport] fix: error overlay not closing when backdrop clicked (<a
href="https://redirect.github.com/vercel/next.js/issues/83981">#83981</a>)
(<a
href="https://redirect.github.com/vercel/next.js/issues/83">#83</a>...</li>
<li><a
href="0a29888575"><code>0a29888</code></a>
[backport] fix: devtools initial position should be from next config (<a
href="https://redirect.github.com/vercel/next.js/issues/83571">#83571</a>)...</li>
<li><a
href="7a53950c13"><code>7a53950</code></a>
[backport] Turbopack: don't treat metadata routes as RSC (<a
href="https://redirect.github.com/vercel/next.js/issues/83804">#83804</a>)</li>
<li><a
href="050bdf1ae7"><code>050bdf1</code></a>
[backport] Turbopack: throw large static metadata error earlier (<a
href="https://redirect.github.com/vercel/next.js/issues/83816">#83816</a>)</li>
<li><a
href="1f6ea09f85"><code>1f6ea09</code></a>
[backport] Turbopack: Improve handling of symlink resolution errors (<a
href="https://redirect.github.com/vercel/next.js/issues/83805">#83805</a>)</li>
<li><a
href="c7d1855499"><code>c7d1855</code></a>
[backport] CI: use KV for test timing data (<a
href="https://redirect.github.com/vercel/next.js/issues/83860">#83860</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/vercel/next.js/compare/v15.5.3...v15.5.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=next&package-manager=npm_and_yarn&previous-version=15.5.3&new-version=15.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-10-06 00:01:38 -04:00
Matthew Farrellee
351c4b98e4
chore: inference=remote::llama-openai-compat does not support /v1/completion (#3683)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Python Package Build Test / build (3.12) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 16s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 18s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m22s
## What does this PR do?

skip completion tests for inference=remote::llama-openai-compat

## Test Plan

ci
2025-10-04 11:36:48 -07:00
Ashwin Bharambe
045a0c1d57
feat(tests): implement test isolation for inference recordings (#3681)
Uses test_id in request hashes and test-scoped subdirectories to prevent
cross-test contamination. Model list endpoints exclude test_id to enable
merging recordings from different servers.

Additionally, this PR adds a `record-if-missing` mode (which we will use
instead of `record` which records everything) which is very useful.

🤖 Co-authored with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude <noreply@anthropic.com>
2025-10-04 11:34:18 -07:00
Ashwin Bharambe
3f36bfaeaa
chore(tests): normalize recording IDs and timestamps to reduce git diff noise (#3676)
IDs are now deterministic hashes based on request content, and
timestamps are normalized to constants, eliminating spurious changes
when re-recording tests.

## Changes
- Updated `inference_recorder.py` to normalize IDs and timestamps during
recording
- Added `scripts/normalize_recordings.py` utility to re-normalize
existing recordings
- Created documentation in `tests/integration/recordings/README.md`
- Normalized 350 existing recording files
2025-10-03 17:26:11 -07:00