Commit graph

82 commits

Author SHA1 Message Date
Ken Dreyer
dc4665af17
feat!: change bedrock bearer token env variable to match AWS docs & boto3 convention (#4152)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Successful in 50s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test Llama Stack Build / build (push) Successful in 49s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Test External API and Providers / test-external (venv) (push) Failing after 1m18s
Unit Tests / unit-tests (3.13) (push) Failing after 1m58s
Unit Tests / unit-tests (3.12) (push) Failing after 2m5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m28s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m37s
Pre-commit / pre-commit (push) Successful in 3m50s
Rename `AWS_BEDROCK_API_KEY` to `AWS_BEARER_TOKEN_BEDROCK` to align with
the naming convention used in AWS Bedrock documentation and the AWS web
console UI. This reduces confusion when developers compare LLS docs with
AWS docs.

Closes #4147
2025-11-21 09:48:05 -05:00
Charlie Doern
d5cd0eea14
feat!: standardize base_url for inference (#4177)
# What does this PR do?

Completes #3732 by removing runtime URL transformations and requiring
users to provide full URLs in configuration. All providers now use
'base_url' consistently and respect the exact URL provided without
appending paths like /v1 or /openai/v1 at runtime.

BREAKING CHANGE: Users must update configs to include full URL paths
(e.g., http://localhost:11434/v1 instead of http://localhost:11434).

Closes #3732 

## Test Plan

Existing tests should pass even with the URL changes, due to default
URLs being altered.

Add unit test to enforce URL standardization across remote inference
providers (verifies all use 'base_url' field with HttpUrl | None type)

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-19 08:44:28 -08:00
Theofanis Petkos
5fe6098350
docs: Improvements on provider_codegen for type hints and multi-line yaml descriptions (#4033)
# What does this PR do?

This PR improves type hint cleanup in auto-generated provider
documentation by adding regex logic.

**Issues Fixed:**
- Type hints with missing closing brackets (e.g., `list[str` instead of
`list[str]`)
- Types showing as `<class 'bool'>`, `<class 'str'>` instead of `bool`,
`str`
- The multi-line YAML frontmatter in index documentation files wasn't
ideal, so we now add the proper `|` character.

**Changes:**
1. Replaced string replacement (`.replace`) with regex-based type
cleaning to preserve the trailing bracket in case of `list` and `dict`.
2. Adds the `|` character for multi-line YAML descriptions.
3. I have regenerated the docs. However, let me know if that's not
needed.

## Test Plan

1. Ran uv run python scripts/provider_codegen.py - successfully
regenerated all docs
2. We can see that the updated docs handle correctly type hint cleanup
and multi-line yaml descriptions have now the `|` character.

### Note to the reviewer(s)

This is my first contribution to your lovely repo! Initially I was going
thourgh docs (wanted to use `remote::gemini` as provider) and realized
the issue. I've read the
[CONTRIBUTING.md](https://github.com/llamastack/llama-stack/blob/main/CONTRIBUTING.md)
and decided to open the PR. Let me know if there's anything I did wrong
and I'll update my PR!

---------

Signed-off-by: thepetk <thepetk@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-17 12:35:28 -08:00
Omar Abdelwahab
fe91d331ef
fix: Remove authorization from provider data (#4161)
# What does this PR do?
- Remove backward compatibility for authorization in mcp_headers
- Enforce authorization must use dedicated parameter  
- Add validation error if Authorization found in provider_data headers
- Update test_mcp.py to use authorization parameter
- Update test_mcp_json_schema.py to use authorization parameter
- Update test_tools_with_schemas.py to use authorization parameter
- Update documentation to show the change in the authorization approach

Breaking Change:
- Authorization can no longer be passed via mcp_headers in provider_data
- Users must use the dedicated 'authorization' parameter instead
- Clear error message guides users to the new approach"

## Test Plan
CI

---------

Co-authored-by: Omar Abdelwahab <omara@fb.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-17 12:16:35 -08:00
Charlie Doern
840ad75fe9
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do?

Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.


see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests


Pre-cursor PRs to this one:

- #4093 
- #3954 
- #4064 

These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.


relates to #3237 

## Test Plan

Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-13 11:51:17 -08:00
Francisco Arceo
4442b24de7
chore: Fix docs so can be deployed (#4149)
# What does this PR do?
Building/Deploying docs is failing here:
5530320962 (step):8:49

Needs the playground file. Updated it to reflect current admin status.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-13 09:15:32 -08:00
Akram Ben Aissi
9eb81439d2
docs: Add comprehensive Files API and Vector Store integration doc (#3279)
docs: Add comprehensive Files API and Vector Store integration
documentation

- Add Files API documentation with OpenAI-compatible endpoints
- Create comprehensive guide for OpenAI-compatible file operations
- Reorganize documentation structure: move file operations to files/
directory
- Add vector store provider documentation for Milvus, SQLite-vec, FAISS
- Clean up redundant files and improve navigation
- Update cross-references and eliminate documentation duplication
- Support for release 0.2.14 FileResponse and Vector Store API features

# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
2025-11-13 08:50:06 -05:00
Derek Higgins
356f37b1ba
docs: clarify model identification uses provider_model_id not model_id (#4128)
Updated documentation to accurately reflect current behavior where
models are identified as provider_id/provider_model_id in the system.

Changes:
o Clarify that model_id is for configuration purposes only o Explain
models are accessed as provider_id/provider_model_id o Remove outdated
aliasing example that suggested model_id could be used
  as a custom identifier

This corrects the documentation which previously suggested model_id
could be used to create friendly aliases, which is not how the code
actually works.

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-12 10:13:26 -08:00
ehhuang
71b328fc4b
chore(ui): add npm package and dockerfile (#4100)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 2s
Integration Tests (Replay) / generate-matrix (push) Successful in 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 53s
# What does this PR do?
- sets up package.json for npm `llama-stack-ui` package (will update
llama-stack-ops)
- adds dockerfile for UI docker image

## Test Plan
npx:
npm build && npm pack
LLAMA_STACK_UI_PORT=8322 npx
/Users/erichuang/projects/ui/src/llama_stack_ui/llama-stack-ui-0.4.0-alpha.2.tgz

docker:
cd src/llama_stack_ui
docker build . -f Dockerfile  --tag test_ui --no-cache

❯ docker run -p 8322:8322 \
      -e LLAMA_STACK_UI_PORT=8322 \
      test_ui:latest
2025-11-11 10:40:31 -08:00
paulengineer
e5a55f3677
docs: use 'uv pip' to avoid pitfalls of using 'pip' in virtual environment (#4122)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Python Package Build Test / build (3.12) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 25s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
UI Tests / ui-tests (22) (push) Successful in 53s
# What does this PR do?
In the **Detailed Tutorial**, at **Step 3**, the **Install with venv**
option creates a new virtual environment `client`, activates it then
attempts to install the llama-stack-client using pip.
```
uv venv client --python 3.12
source client/bin/activate
pip install llama-stack-client    <- this is the problematic line
```
However, the pip command will likely fail because the `uv venv` command
doesn't, by default, include adding the pip command to the virtual
environment that is created. The pip command will error either because
pip doesn't exist at all, or, if the pip command does exist outside of
the virtual environment, return a different error message. The latter
may be unclear to the user why it is failing.

This PR changes 'pip' to 'uv pip', allowing the install action to
function in the virtual environment as intended, and without the need
for pip to be installed.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
1. Use linux or WSL (virtual environments on Windows use `Scripts`
folder instead of `bin` [virtualenv
#993ba13](993ba1316a)
which doesn't align with the tutorial)
2. Clone the `llama-stack` repo
3. Run the following and verify success:
```
uv venv client --python 3.12
source client/bin/activate
```
5. Run the updated command:
```
uv pip install llama-stack-client
```
6. Observe the console output confirms that the virtual environment
`client` was used:

> Using Python 3.12.3 environment at: **client**
2025-11-11 07:49:03 -05:00
Dennis Kennetz
209a78b618
feat: add oci genai service as chat inference provider (#3876)
# What does this PR do?
Adds OCI GenAI PaaS models for openai chat completion endpoints.

## Test Plan
In an OCI tenancy with access to GenAI PaaS, perform the following
steps:

1. Ensure you have IAM policies in place to use service (check docs
included in this PR)
2. For local development, [setup OCI
cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm)
and configure the CLI with your region, tenancy, and auth
[here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm)
3. Once configured, go through llama-stack setup and run llama-stack
(uses config based auth) like:
```bash
OCI_AUTH_TYPE=config_file \
OCI_CLI_PROFILE=CHICAGO \
OCI_REGION=us-chicago-1 \
OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \
llama stack run oci
```
4. Hit the `models` endpoint to list models after server is running:
```bash
curl http://localhost:8321/v1/models | jq
...
{
      "identifier": "meta.llama-4-scout-17b-16e-instruct",
      "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q",
      "provider_id": "oci",
      "type": "model",
      "metadata": {
        "display_name": "meta.llama-4-scout-17b-16e-instruct",
        "capabilities": [
          "CHAT"
        ],
        "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q"
      },
      "model_type": "llm"
},
   ...
```
5. Use the "display_name" field to use the model in a
`/chat/completions` request:
```bash
# Streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": true,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'

# Non-streaming result
curl -X POST http://localhost:8321/v1/chat/completions   -H "Content-Type: application/json"   -d '{
        "model": "meta.llama-4-scout-17b-16e-instruct",
       "stream": false,
       "temperature": 0.9,
      "messages": [
         {
           "role": "system",
           "content": "You are a funny comedian. You can be crass."
         },
          {
           "role": "user",
          "content": "Tell me a funny joke about programming."
         }
       ]
}'
```
6. Try out other models from the `/models` endpoint.
2025-11-10 16:16:24 -05:00
Vaishnavi Hire
4341c4c2ac
docs: Add Llama Stack Operator docs (#3983)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
Add documentation for llama-stack-k8s-operator under kubernetes
deployment guide.

Signed-off-by: Vaishnavi Hire <vhire@redhat.com>
2025-11-10 15:29:15 +01:00
Aakanksha Duggal
b83184f7ef
feat(responses)!: Add web_search_2025_08_26 to the WebSearchToolTypes (#4103)
# What does this PR do?
Resolves #4102 

1. Added `web_search_2025_08_26` to the `WebSearchToolTypes` list and
the `OpenAIResponseInputToolWebSearch.type` Literal union
2. No changes needed to tool execution logic - all `web_search` types
map to the same underlying tool
3. Backward compatibility is maintained - existing `web_search`,
`web_search_preview`, and `web_search_preview_2025_03_11` types continue
to work
4. Added an integration test case using {"type":
"web_search_2025_08_26"} to verify it works correctly
5. Updated `docs/docs/providers/openai_responses_limitations.mdx` to
reflect that `web_search_2025_08_26` is now supported.
6. Removed incorrect references to `MOD1/MOD2/MOD3` (which don't exist
in the codebase)


<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-07 10:01:12 -08:00
Ashwin Bharambe
f49cb0b717
chore: Stack server no longer depends on llama-stack-client (#4094)
This dependency has been bothering folks for a long time (cc @leseb). We
really needed it due to "library client" which is primarily used for our
tests and is not a part of the Stack server. Anyone who needs to use the
library client can certainly install `llama-stack-client` in their
environment to make that work.

Updated the notebook references to install `llama-stack-client`
additionally when setting things up.
2025-11-07 09:54:09 -08:00
Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00
Ashwin Bharambe
a2c4c12384
chore(ui): remove the Streamlit UI (#4097) 2025-11-06 15:51:57 -08:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
Roy Belio
c672a5d792
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do?

The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.

This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.

**Closes: #2619**  
**Supersedes: #2851** (rebased and updated)

## Changes Made

1. **Added PostgreSQL support to starter distribution**
   - New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
   - Removed separate `postgres-demo` distribution

2. **Updated to new build system**
   - Integrated postgres switching logic into Containerfile entrypoint
   - Uses new `storage_backends` and `storage_stores` API
   - Properly configured both PostgreSQL KV store and SQL store

3. **Updated dependencies**
   - Added `psycopg2-binary` and `asyncpg` to starter distribution
   - All postgres-related dependencies automatically included

## How to Use

### With Docker (PostgreSQL):
```bash
docker run \
  -e ENABLE_POSTGRES_STORE=1 \
  -e POSTGRES_HOST=your_postgres_host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=llamastack \
  -e POSTGRES_USER=llamastack \
  -e POSTGRES_PASSWORD=llamastack \
  -e OPENAI_API_KEY=your_key \
  llamastack/distribution-starter
```

### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)

## Test Plan

All pre-commit hooks pass (mypy, ruff, distro-codegen)  
`llama stack list-deps starter` confirms psycopg2-binary is included  
Storage configuration correctly uses PostgreSQL backends  
Container builds successfully with postgres support  

## Credits

Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Sébastien Han @leseb

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-11-05 15:37:06 -08:00
ehhuang
95b0493fae
chore: move src/llama_stack/ui to src/llama_stack_ui (#4068)
# What does this PR do?
This better separates UI from backend code, which was a point of
confusion often for our beloved AI friends.


## Test Plan
CI
2025-11-04 15:21:49 -08:00
Ashwin Bharambe
cb40da210f
fix: update tests for OpenAI-style models endpoint (#4053)
The llama-stack-client now uses /`v1/openai/v1/models` which returns
OpenAI-compatible model objects with 'id' and 'custom_metadata' fields
instead of the Resource-style 'identifier' field. Updated api_recorder
to handle the new endpoint and modified tests to access model metadata
appropriately. Deleted stale model recordings for re-recording.

**NOTE: CI will be red on this one since it is dependent on
https://github.com/llamastack/llama-stack-client-python/pull/291/files
landing. I verified locally that it is green.**
2025-11-03 17:30:08 -08:00
Sébastien Han
4a5ef65286
chore!: remove SDG API (#4035)
# What does this PR do?

This API hasn't received any traction and close to zero interest from
the community. Let's revisit in the future if things change.

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-03 16:12:06 -08:00
Jiayi Ni
fa7699d2c3
feat: Add rerank API for NVIDIA Inference Provider (#3329)
# What does this PR do?
Add rerank API for NVIDIA Inference Provider.

<!-- If resolving an issue, uncomment and update the line below -->
Closes #3278 

## Test Plan
Unit test:
```
pytest tests/unit/providers/nvidia/test_rerank_inference.py
```

Integration test: 
```
pytest -s -v tests/integration/inference/test_rerank.py   --stack-config="inference=nvidia"   --rerank-model=nvidia/nvidia/nv-rerankqa-mistral-4b-v3   --env NVIDIA_API_KEY=""   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```
2025-10-30 21:42:09 -07:00
ehhuang
1f9d48cd54
feat: openai files provider (#3946)
# What does this PR do?
- Adds OpenAI files provider 
- Note that file content retrieval is pretty limited by `purpose`
https://community.openai.com/t/file-uploads-error-why-can-t-i-download-files-with-purpose-user-data/1357013?utm_source=chatgpt.com

## Test Plan
Modify run yaml to use openai files provider:
```
  files:
  - provider_id: openai
    provider_type: remote::openai
    config:
      api_key: ${env.OPENAI_API_KEY:=}
      metadata_store:
        backend: sql_default
        table_name: openai_files_metadata

# Then run files tests
❯ uv run --no-sync ./scripts/integration-tests.sh --stack-config server:ci-tests --inference-mode replay --setup ollama --suite base --pattern test_files
```
2025-10-28 16:25:03 -07:00
raghotham
feabcdd67b
docs: add documentation on how to use custom run yaml in docker (#3949)
as title

test plan:

```yaml
# custom-ollama-run.yaml
version: 2
image_name: starter
external_providers_dir: /.llama/providers.d
apis:
- inference
- vector_io
- files
- safety
- tool_runtime
- agents

providers:
  inference:
  # Single Ollama provider for all models
  - provider_id: ollama
    provider_type: remote::ollama
    config:
      url: ${env.OLLAMA_URL:=http://localhost:11434}

  vector_io:
  - provider_id: faiss
    provider_type: inline::faiss
    config:
      persistence:
        namespace: vector_io::faiss
        backend: kv_default

  files:
  - provider_id: meta-reference-files
    provider_type: inline::localfs
    config:
      storage_dir: /.llama/files
      metadata_store:
        table_name: files_metadata
        backend: sql_default

  safety:
  - provider_id: llama-guard
    provider_type: inline::llama-guard
    config:
      excluded_categories: []

  tool_runtime:
  - provider_id: rag-runtime
    provider_type: inline::rag-runtime

  agents:
  - provider_id: meta-reference
    provider_type: inline::meta-reference
    config:
      persistence:
        agent_state:
          namespace: agents
          backend: kv_default
        responses:
          table_name: responses
          backend: sql_default
          max_write_queue_size: 10000
          num_writers: 4

storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: /.llama/kvstore.db
    sql_default:
      type: sql_sqlite
      db_path: /.llama/sql_store.db
  stores:
    metadata:
      namespace: registry
      backend: kv_default
    inference:
      table_name: inference_store
      backend: sql_default
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      table_name: openai_conversations
      backend: sql_default

registered_resources:
  models:
  # All models use the same 'ollama' provider
  - model_id: llama3.2-vision:latest
    provider_id: ollama
    provider_model_id: llama3.2-vision:latest
    model_type: llm
  - model_id: llama3.2:3b
    provider_id: ollama
    provider_model_id: llama3.2:3b
    model_type: llm
  # Embedding models
  - model_id: nomic-embed-text-v2-moe
    provider_id: ollama
    provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
    model_type: embedding
    metadata:
      embedding_dimension: 768
  shields: []
  vector_dbs: []
  datasets: []
  scoring_fns: []
  benchmarks: []
  tool_groups: []

server:
  port: 8321

telemetry:
  enabled: true

vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: ollama
    model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
```

```bash
docker run
     -it
     --pull always
     -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
     -v ~/.llama:/root/.llama
     -v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml
     -e RUN_CONFIG_PATH=/app/custom-run.yaml
     -e OLLAMA_URL=http://host.docker.internal:11434/
     llamastack/distribution-starter:0.3.0
     --port $LLAMA_STACK_PORT
```
2025-10-28 16:05:44 -07:00
ehhuang
b7dd3f5c56
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do?


## Test Plan
CI
vector_io tests will fail until next client sync

passed with
https://github.com/llamastack/llama-stack-client-python/pull/286 checked
out locally
2025-10-27 14:26:06 -07:00
IAN MILLER
98a5047f9d
feat(prompts): attach prompts to storage stores in run configs (#3893)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
This PR is responsible for attaching prompts to storage stores in run
configs. It allows to specify prompts as stores in different
distributions. The need of this functionality was initiated in #3514

> Note, #3514 is divided on three separate PRs. Current PR is the first
of three.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Manual testing and updated CI unit tests

Prerequisites:

1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1
uv pip install`

2. `llama stack run starter `

```
INFO     2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration:                            
         /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml                                        
INFO     2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates:                    
           Key: None                                                                                                    
           Cert: None                                                                                                   
INFO     2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321                 
INFO     2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration:                    
INFO     2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis:                                 
         - agents                                                                                                       
         - batches                                                                                                      
         - datasetio                                                                                                    
         - eval                                                                                                         
         - files                                                                                                        
         - inference                                                                                                    
         - post_training                                                                                                
         - safety                                                                                                       
         - scoring                                                                                                      
         - tool_runtime                                                                                                 
         - vector_io                                                                                                    
         image_name: starter                                                                                            
         providers:                                                                                                     
           agents:                                                                                                      
           - config:                                                                                                    
               persistence:                                                                                             
                 agent_state:                                                                                           
                   backend: kv_default                                                                                  
                   namespace: agents                                                                                    
                 responses:                                                                                             
                   backend: sql_default                                                                                 
                   max_write_queue_size: 10000                                                                          
                   num_writers: 4                                                                                       
                   table_name: responses                                                                                
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           batches:                                                                                                     
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: batches                                                                                     
             provider_id: reference                                                                                     
             provider_type: inline::reference                                                                           
           datasetio:                                                                                                   
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::huggingface                                                                      
             provider_id: huggingface                                                                                   
             provider_type: remote::huggingface                                                                         
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: datasetio::localfs                                                                          
             provider_id: localfs                                                                                       
             provider_type: inline::localfs                                                                             
           eval:                                                                                                        
           - config:                                                                                                    
               kvstore:                                                                                                 
                 backend: kv_default                                                                                    
                 namespace: eval                                                                                        
             provider_id: meta-reference                                                                                
             provider_type: inline::meta-reference                                                                      
           files:                                                                                                       
           - config:                                                                                                    
               metadata_store:                                                                                          
                 backend: sql_default                                                                                   
                 table_name: files_metadata                                                                             
               storage_dir: /Users/ianmiller/.llama/distributions/starter/files                                         
             provider_id: meta-reference-files                                                                          
             provider_type: inline::localfs                                                                             
           inference:                                                                                                   
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.fireworks.ai/inference/v1                                                               
             provider_id: fireworks                                                                                     
             provider_type: remote::fireworks                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.together.xyz/v1                                                                         
             provider_id: together                                                                                      
             provider_type: remote::together                                                                            
           - config: {}                                                                                                 
             provider_id: bedrock                                                                                       
             provider_type: remote::bedrock                                                                             
           - config:                                                                                                    
               api_key: '********'                                                                                      
               base_url: https://api.openai.com/v1                                                                      
             provider_id: openai                                                                                        
             provider_type: remote::openai                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: anthropic                                                                                     
             provider_type: remote::anthropic                                                                           
           - config:                                                                                                    
               api_key: '********'                                                                                      
             provider_id: gemini                                                                                        
             provider_type: remote::gemini                                                                              
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.groq.com                                                                                
             provider_id: groq                                                                                          
             provider_type: remote::groq                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               url: https://api.sambanova.ai/v1                                                                         
             provider_id: sambanova                                                                                     
             provider_type: remote::sambanova                                                                           
           - config: {}                                                                                                 
             provider_id: sentence-transformers                                                                         
             provider_type: inline::sentence-transformers                                                               
           post_training:                                                                                               
           - config:                                                                                                    
               checkpoint_format: meta                                                                                  
             provider_id: torchtune-cpu                                                                                 
             provider_type: inline::torchtune-cpu                                                                       
           safety:                                                                                                      
           - config:                                                                                                    
               excluded_categories: []                                                                                  
             provider_id: llama-guard                                                                                   
             provider_type: inline::llama-guard                                                                         
           - config: {}                                                                                                 
             provider_id: code-scanner                                                                                  
             provider_type: inline::code-scanner                                                                        
           scoring:                                                                                                     
           - config: {}                                                                                                 
             provider_id: basic                                                                                         
             provider_type: inline::basic                                                                               
           - config: {}                                                                                                 
             provider_id: llm-as-judge                                                                                  
             provider_type: inline::llm-as-judge                                                                        
           - config:                                                                                                    
               openai_api_key: '********'                                                                               
             provider_id: braintrust                                                                                    
             provider_type: inline::braintrust                                                                          
           tool_runtime:                                                                                                
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: brave-search                                                                                  
             provider_type: remote::brave-search                                                                        
           - config:                                                                                                    
               api_key: '********'                                                                                      
               max_results: 3                                                                                           
             provider_id: tavily-search                                                                                 
             provider_type: remote::tavily-search                                                                       
           - config: {}                                                                                                 
             provider_id: rag-runtime                                                                                   
             provider_type: inline::rag-runtime                                                                         
           - config: {}                                                                                                 
             provider_id: model-context-protocol                                                                        
             provider_type: remote::model-context-protocol                                                              
           vector_io:                                                                                                   
           - config:                                                                                                    
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::faiss                                                                            
             provider_id: faiss                                                                                         
             provider_type: inline::faiss                                                                               
           - config:                                                                                                    
               db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db                                     
               persistence:                                                                                             
                 backend: kv_default                                                                                    
                 namespace: vector_io::sqlite_vec                                                                       
             provider_id: sqlite-vec                                                                                    
             provider_type: inline::sqlite-vec                                                                          
         registered_resources:                                                                                          
           benchmarks: []                                                                                               
           datasets: []                                                                                                 
           models: []                                                                                                   
           scoring_fns: []                                                                                              
           shields: []                                                                                                  
           tool_groups:                                                                                                 
           - provider_id: tavily-search                                                                                 
             toolgroup_id: builtin::websearch                                                                           
           - provider_id: rag-runtime                                                                                   
             toolgroup_id: builtin::rag                                                                                 
           vector_stores: []                                                                                            
         server:                                                                                                        
           port: 8321                                                                                                   
         storage:                                                                                                       
           backends:                                                                                                    
             kv_default:                                                                                                
               db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db                                        
               type: kv_sqlite                                                                                          
             sql_default:                                                                                               
               db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db                                      
               type: sql_sqlite                                                                                         
           stores:                                                                                                      
             conversations:                                                                                             
               backend: sql_default                                                                                     
               table_name: openai_conversations                                                                         
             inference:                                                                                                 
               backend: sql_default                                                                                     
               max_write_queue_size: 10000                                                                              
               num_writers: 4                                                                                           
               table_name: inference_store                                                                              
             metadata:                                                                                                  
               backend: kv_default                                                                                      
               namespace: registry                                                                                      
             prompts:                                                                                                   
               backend: kv_default                                                                                      
               namespace: prompts                                                                                       
         telemetry:                                                                                                     
           enabled: true                                                                                                
         vector_stores:                                                                                                 
           default_embedding_model:                                                                                     
             model_id: nomic-ai/nomic-embed-text-v1.5                                                                   
             provider_id: sentence-transformers                                                                         
           default_provider_id: faiss                                                                                   
         version: 2                                                                                                     
                                                                                                                        
INFO     2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue        
         disabled for SQLite to avoid concurrency issues                                                                
WARNING  2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry:          
         OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry                                                     
INFO     2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils:               
         OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models                                           
INFO     2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328]                         
INFO     2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup.                       
INFO     2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server        
         (version: 0.3.0)                                                                                               
INFO     2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task                        
INFO     2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete.                          
INFO     2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321      
         (Press CTRL+C to quit)   
```
As you can see, prompts are attached to stores in config

Testing:

1. Create prompt:

```
curl -X POST http://localhost:8321/v1/prompts \                 
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.",
    "variables": ["name", "company", "role", "tone"]
  }'
```

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

2. Get prompt:

`curl -X GET
http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f`

`{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is
{{role}} at {{company}}. Remember, {{name}}, to be
{{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}%
`

3. Query sqlite KV storage to check created prompt:

```
sqlite> .mode column
sqlite> .headers on
sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%';
key                                                           value                                                         expiration
------------------------------------------------------------  ------------------------------------------------------------  ----------
prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab            
163f:1                                                        5f6e163f", "prompt": "Hello {{name}}! You are working at {{c            
                                                              ompany}}. Your role is {{role}} at {{company}}. Remember, {{            
                                                              name}}, to be {{tone}}.", "version": 1, "variables": ["name"            
                                                              , "company", "role", "tone"], "is_default": false}                      

prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e  1                                                                       
163f:default                                                                                                                          
sqlite> 
```
2025-10-27 11:12:12 -07:00
ehhuang
509676641a
chore: update run configs (#3902)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m34s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
# What does this PR do?
telemetry was deprecated


## Test Plan
2025-10-24 15:03:06 -07:00
ehhuang
2a1a813308
chore: update docs for telemetry api removal (#3900)
# What does this PR do?
Telemetry is no longer an API/provider.

## Test Plan
2025-10-24 13:57:28 -07:00
Francisco Arceo
4566eebe05
feat: Add static file import system for docs (#3882)
# What does this PR do?

Add static file import system for docs

- Use `remark-code-import` plugin to embed code at build time
- Support importing Python code with syntax highlighting using
`raw-loader` + `ReactMarkdown`

One caveat is that currently when embedding markdown with code used the
syntax highlighting isn't behaving but I'll investigate that in a follow
up.

## Test Plan

Python Example:
<img width="1372" height="995" alt="Screenshot 2025-10-23 at 9 22 18 PM"
src="https://github.com/user-attachments/assets/656d2c78-4d9b-45a4-bd5e-3f8490352b85"
/>

Markdown example:
<img width="1496" height="1070" alt="Screenshot 2025-10-23 at 9 22
38 PM"
src="https://github.com/user-attachments/assets/6c0a07ec-ff7c-45aa-b05f-8c46acd4445c"
/>

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-24 14:01:33 -04:00
Ashwin Bharambe
658fb2c777 refactor(k8s): update run configs to v2 storage and registered_resources structure
Some checks failed
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m30s
Migrates k8s run configs to match the updated run configs

- Replace storage.references with storage.stores
- Wrap resources under registered_resources section
- Update provider configs to use persistence with namespace/backend
- Add telemetry and vector_stores top-level sections
- Simplify agent/files metadata store configuration
2025-10-22 15:33:50 -07:00
Jiayi Ni
bb1ebb3c6b
feat: Add rerank models and rerank API change (#3831)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
- Extend the model type to include rerank models.
- Implement `rerank()` method in inference router.
- Add `rerank_model_list` to `OpenAIMixin` to enable providers to
register and identify rerank models
- Update documentation.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
```
pytest tests/unit/providers/utils/inference/test_openai_mixin.py
```
2025-10-22 12:02:28 -07:00
Francisco Arceo
53c20f6113
feat: Adding Demo script (#3870)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 15s
API Conformance Tests / check-schema-compatibility (push) Successful in 24s
UI Tests / ui-tests (22) (push) Successful in 50s
Pre-commit / pre-commit (push) Successful in 1m26s
# What does this PR do?
Updated quickstart `demo_script.py` to use OpenAI APIs, which is simply:

```python
import io, requests
from openai import OpenAI

url="https://www.paulgraham.com/greatwork.html"
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")

vs = client.vector_stores.create()
response = requests.get(url)
pseudo_file = io.BytesIO(str(response.content).encode('utf-8'))
uploaded_file = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants")
client.vector_stores.files.create(vector_store_id=vs.id, file_id=uploaded_file.id)

resp = client.responses.create(
    model="openai/gpt-4o",
    input="How do you do great work? Use the existing knowledge_search tool.",
    tools=[{"type": "file_search", "vector_store_ids": [vs.id]}],
    include=["file_search_call.results"],
)

print(resp)
```



<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-21 21:31:21 -04:00
Alexey Rybak
4c718523fa
docs: fix the building distro file (#3880)
# What does this PR do?
* Fixes the doc server build (which expects a blank line after imports)

## Test Plan
* `cd docs && npm run build`
2025-10-21 14:26:35 -07:00
Ashwin Bharambe
bd3c473208
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877)
Reverts llamastack/llama-stack#3871

This PR broke RAG (even from Responses -- there _is_ a dependency)
2025-10-21 11:22:06 -07:00
Ashwin Bharambe
0e96279bee
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer
targeted. We use the Responses implementation for knowledge_search which
uses the `openai_vector_stores` pathway.

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-10-20 22:26:21 -07:00
Ashwin Bharambe
122de785c4
chore(cleanup)!: kill vector_db references as far as possible (#3864)
There should not be "vector db" anywhere.
2025-10-20 20:06:16 -07:00
Francisco Arceo
48581bf651
chore: Updating how default embedding model is set in stack (#3818)
# What does this PR do?

Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).

New config is simply (default for Starter distro):

```yaml
vector_stores:
  default_provider_id: faiss
  default_embedding_model:
    provider_id: sentence-transformers
    model_id: nomic-ai/nomic-embed-text-v1.5
```

## Test Plan
CI and Unit tests.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-10-20 14:22:45 -07:00
Ashwin Bharambe
2c43285e22
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.**

Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.

## Key Changes

- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.

## Migration

Before:
```yaml
metadata_store:
  type: sqlite
  db_path: ~/.llama/distributions/foo/registry.db
inference_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
conversations_store:
  type: postgres
  host: ${env.POSTGRES_HOST}
  port: ${env.POSTGRES_PORT}
  db: ${env.POSTGRES_DB}
  user: ${env.POSTGRES_USER}
  password: ${env.POSTGRES_PASSWORD}
```

After:
```yaml
storage:
  backends:
    kv_default:
      type: kv_sqlite
      db_path: ~/.llama/distributions/foo/kvstore.db
    sql_default:
      type: sql_postgres
      host: ${env.POSTGRES_HOST}
      port: ${env.POSTGRES_PORT}
      db: ${env.POSTGRES_DB}
      user: ${env.POSTGRES_USER}
      password: ${env.POSTGRES_PASSWORD}
  stores:
    metadata:
      backend: kv_default
      namespace: registry
    inference:
      backend: sql_default
      table_name: inference_store
      max_write_queue_size: 10000
      num_writers: 4
    conversations:
      backend: sql_default
      table_name: openai_conversations
```

Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      kvstore:
        type: sqlite
        db_path: ~/.llama/distributions/foo/chroma.db
```

to:

```yaml
providers:
  vector_io:
  - provider_id: chromadb
    provider_type: remote::chromadb
    config:
      url: ${env.CHROMADB_URL}
      persistence:
        backend: kv_default
        namespace: vector_io::chroma_remote
```

Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
2025-10-20 13:20:09 -07:00
ehhuang
359df3a37c
chore: update doc (#3857)
# What does this PR do?
follows https://github.com/llamastack/llama-stack/pull/3839

## Test Plan
2025-10-20 10:33:21 -07:00
ehhuang
21772de5d3
chore: use dockerfile for building containers (#3839)
# What does this PR do?

relates to #2878 

We introduce a Containerfile which is used to replaced the `llama stack
build` command (removal in a separate PR).

```
llama stack build --distro starter --image-type venv --run
```
is replaced by
```
llama stack list-deps starter | xargs -L1 uv pip install
llama stack run starter
```


- See the updated workflow files for e2e workflow.

## Test Plan
CI
```
❯ docker build . -f docker/Dockerfile --build-arg DISTRO_NAME=starter --build-arg INSTALL_MODE=editable --tag test_starter
❯ docker run -p 8321:8321 test_starter
❯ curl http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {
        "role": "user",
        "content": "Hello!"
      }
    ]
  }'
```





---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3839).
* #3855
* __->__ #3839
2025-10-20 10:23:01 -07:00
Charlie Doern
573e783ff0
docs: fix sidebar of Detailed Tutorial (#3856)
# What does this PR do?

the sidebar currently has an extra `ii. Run the Script` because its
incorrectly put into the doc as an H3 not an H4 (like the other ones)


<img width="239" height="218" alt="Screenshot 2025-10-20 at 1 04 54 PM"
src="https://github.com/user-attachments/assets/eb8cb26e-7ea9-4b61-9101-d64965b39647"
/>

Fix this which will update the sidebar

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-20 13:10:50 -04:00
Charlie Doern
b11bcfde11
refactor(build): rework CLI commands and build process (1/2) (#2974)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 22s
Test llama stack list-deps / show-single-provider (push) Failing after 53s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 18s
Python Package Build Test / build (3.13) (push) Failing after 24s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 26s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 27s
Unit Tests / unit-tests (3.12) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (push) Failing after 44s
API Conformance Tests / check-schema-compatibility (push) Successful in 52s
Test llama stack list-deps / generate-matrix (push) Successful in 52s
Test Llama Stack Build / build (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 53s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1m2s
Unit Tests / unit-tests (3.13) (push) Failing after 1m30s
Test llama stack list-deps / list-deps-from-config (push) Failing after 1m59s
Test llama stack list-deps / list-deps (push) Failing after 1m10s
UI Tests / ui-tests (22) (push) Successful in 2m26s
Pre-commit / pre-commit (push) Successful in 3m8s
# What does this PR do?

This PR does a few things outlined in #2878 namely:
1. adds `llama stack list-deps` a command which simply takes the build
logic and instead of executing one of the `build_...` scripts, it
displays all of the providers' dependencies using the `module` and `uv`.
2. deprecated `llama stack build` in favor of `llama stack list-deps`
3. updates all tests to use `list-deps` alongside `build`.

PR 2/2 will migrate `llama stack run`'s default behavior to be `llama
stack build --run` and use the new `list-deps` command under the hood
before running the server.

examples of `llama stack list-deps starter`

```
llama stack list-deps starter --format json
{
  "name": "starter",
  "description": "Quick start template for running Llama Stack with several popular providers. This distribution is intended for CPU-only environments.",
  "apis": [
    {
      "api": "inference",
      "provider": "remote::cerebras"
    },
    {
      "api": "inference",
      "provider": "remote::ollama"
    },
    {
      "api": "inference",
      "provider": "remote::vllm"
    },
    {
      "api": "inference",
      "provider": "remote::tgi"
    },
    {
      "api": "inference",
      "provider": "remote::fireworks"
    },
    {
      "api": "inference",
      "provider": "remote::together"
    },
    {
      "api": "inference",
      "provider": "remote::bedrock"
    },
    {
      "api": "inference",
      "provider": "remote::nvidia"
    },
    {
      "api": "inference",
      "provider": "remote::openai"
    },
    {
      "api": "inference",
      "provider": "remote::anthropic"
    },
    {
      "api": "inference",
      "provider": "remote::gemini"
    },
    {
      "api": "inference",
      "provider": "remote::vertexai"
    },
    {
      "api": "inference",
      "provider": "remote::groq"
    },
    {
      "api": "inference",
      "provider": "remote::sambanova"
    },
    {
      "api": "inference",
      "provider": "remote::azure"
    },
    {
      "api": "inference",
      "provider": "inline::sentence-transformers"
    },
    {
      "api": "vector_io",
      "provider": "inline::faiss"
    },
    {
      "api": "vector_io",
      "provider": "inline::sqlite-vec"
    },
    {
      "api": "vector_io",
      "provider": "inline::milvus"
    },
    {
      "api": "vector_io",
      "provider": "remote::chromadb"
    },
    {
      "api": "vector_io",
      "provider": "remote::pgvector"
    },
    {
      "api": "files",
      "provider": "inline::localfs"
    },
    {
      "api": "safety",
      "provider": "inline::llama-guard"
    },
    {
      "api": "safety",
      "provider": "inline::code-scanner"
    },
    {
      "api": "agents",
      "provider": "inline::meta-reference"
    },
    {
      "api": "telemetry",
      "provider": "inline::meta-reference"
    },
    {
      "api": "post_training",
      "provider": "inline::torchtune-cpu"
    },
    {
      "api": "eval",
      "provider": "inline::meta-reference"
    },
    {
      "api": "datasetio",
      "provider": "remote::huggingface"
    },
    {
      "api": "datasetio",
      "provider": "inline::localfs"
    },
    {
      "api": "scoring",
      "provider": "inline::basic"
    },
    {
      "api": "scoring",
      "provider": "inline::llm-as-judge"
    },
    {
      "api": "scoring",
      "provider": "inline::braintrust"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::brave-search"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::tavily-search"
    },
    {
      "api": "tool_runtime",
      "provider": "inline::rag-runtime"
    },
    {
      "api": "tool_runtime",
      "provider": "remote::model-context-protocol"
    },
    {
      "api": "batches",
      "provider": "inline::reference"
    }
  ],
  "pip_dependencies": [
    "pandas",
    "opentelemetry-exporter-otlp-proto-http",
    "matplotlib",
    "opentelemetry-sdk",
    "sentence-transformers",
    "datasets",
    "pymilvus[milvus-lite]>=2.4.10",
    "codeshield",
    "scipy",
    "torchvision",
    "tree_sitter",
    "h11>=0.16.0",
    "aiohttp",
    "pymongo",
    "tqdm",
    "pythainlp",
    "pillow",
    "torch",
    "emoji",
    "grpcio>=1.67.1,<1.71.0",
    "fireworks-ai",
    "langdetect",
    "psycopg2-binary",
    "asyncpg",
    "redis",
    "together",
    "torchao>=0.12.0",
    "openai",
    "sentencepiece",
    "aiosqlite",
    "google-cloud-aiplatform",
    "faiss-cpu",
    "numpy",
    "sqlite-vec",
    "nltk",
    "scikit-learn",
    "mcp>=1.8.1",
    "transformers",
    "boto3",
    "huggingface_hub",
    "ollama",
    "autoevals",
    "sqlalchemy[asyncio]",
    "torchtune>=0.5.0",
    "chromadb-client",
    "pypdf",
    "requests",
    "anthropic",
    "chardet",
    "aiosqlite",
    "fastapi",
    "fire",
    "httpx",
    "uvicorn",
    "opentelemetry-sdk",
    "opentelemetry-exporter-otlp-proto-http"
  ]
}
```

<img width="1500" height="420" alt="Screenshot 2025-10-16 at 5 53 03 PM"
src="https://github.com/user-attachments/assets/765929fb-93e2-44d7-9c3d-8918b70fc721"
/>

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-17 19:52:14 -07:00
Alexey Rybak
224c99560c
docs: update docstrings for better formatting (#3838)
# What does this PR do?
Updates docstrings for Conversations and Eval APIs to render better in
the docs nav sidebar.

Before: 
<img width="363" height="233" alt="Screenshot 2025-10-17 at 9 52 17 AM"
src="https://github.com/user-attachments/assets/3a77f9e3-3b03-43ae-8584-a21d1f44d54d"
/>

After:
<img width="410" height="206" alt="Screenshot 2025-10-17 at 9 52 11 AM"
src="https://github.com/user-attachments/assets/fa5d428d-2bde-4453-84fd-9aceebe712e8"
/>


## Test Plan
* Manual testing
2025-10-17 10:41:50 -07:00
Alexey Rybak
c9f0bebcb7
chore: update API leveling docs with deprecation flag (#3837)
# What does this PR do?
Adds information on the `deprecated=True` flags to the documentation for
extra clarity.

## Test Plan
* Manual testing
2025-10-17 10:17:58 -07:00
Charlie Doern
f22aaef42f
chore!: remove telemetry API usage (#3815)
# What does this PR do?

remove telemetry as a providable API from the codebase. This includes
removing it from generated distributions but also the provider registry,
the router, etc

since `setup_logger` is tied pretty strictly to `Api.telemetry` being in
impls we still need an "instantiated provider" in our implementations.
However it should not be auto-routed or provided. So in
validate_and_prepare_providers (called from resolve_impls) I made it so
that if run_config.telemetry.enabled, we set up the meta-reference
"provider" internally to be used so that log_event will work when
called.

This is the neatest way I think we can remove telemetry from the
provider configs but also not need to rip apart the whole "telemetry is
a provider" logic just yet, but we can do it internally later without
disrupting users.

so telemetry is removed from the registry such that if a user puts
`telemetry:` as an API in their build/run config it will err out, but
can still be used by us internally as we go through this transition.


relates to #3806

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-10-16 10:39:32 -07:00
Bill Murdock
c19eb9854d
docs: Document known limitations of Responses (#3776)
# What does this PR do?

Adds a subpage of the OpenAI compatibility page in the documentation.
This subpage documents known limitations of the Responses API.

<!-- If resolving an issue, uncomment and update the line below -->

Closes #3575

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-10-16 10:26:23 -07:00
ehhuang
6ba9db3929
chore!: BREAKING CHANGE: remove sqlite from telemetry config (#3808)
# What does this PR do?
- Removed sqlite sink from telemetry config.
- Removed related code
- Updated doc related to telemetry

## Test Plan
CI
2025-10-15 14:24:45 -07:00
Francisco Arceo
ef4bc70bbe
feat: Enable setting a default embedding model in the stack (#3803)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do?

Enables automatic embedding model detection for vector stores and by
using a `default_configured` boolean that can be defined in the
`run.yaml`.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
- Unit tests
- Integration tests
- Simple example below:

Spin up the stack:
```bash
uv run llama stack build --distro starter --image-type venv --run
```
Then test with OpenAI's client:
```python
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none")
vs = client.vector_stores.create()
```
Previously you needed:

```python
vs = client.vector_stores.create(
    extra_body={
        "embedding_model": "sentence-transformers/all-MiniLM-L6-v2",
        "embedding_dimension": 384,
    }
)
```

The `extra_body` is now unnecessary.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-10-14 18:25:13 -07:00
IAN MILLER
007efa6eb5
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to replace the Llama Stack's default embedding
model by nomic-embed-text-v1.5.

These are the key reasons why Llama Stack community decided to switch
from all-MiniLM-L6-v2 to nomic-embed-text-v1.5:
1. The training data for
[all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data)
includes a lot of data sets with various licensing terms, so it is
tricky to know when/whether it is appropriate to use this model for
commercial applications.
2. The model is not particularly competitive on major benchmarks. For
example, if you look at the [MTEB
Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click
on Miscellaneous/BEIR to see English information retrieval accuracy, you
see that the top of the leaderboard is dominated by enormous models but
also that there are many, many models of relatively modest size whith
much higher Retrieval scores. If you want to look closely at the data, I
recommend clicking "Download Table" because it is easier to browse that
way.

More discussion info can be founded
[here](https://github.com/llamastack/llama-stack/issues/2418)

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #2418 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
1. Run `./scripts/unit-tests.sh`
2. Integration tests via CI wokrflow

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-10-14 10:44:20 -04:00
Matthew Farrellee
0066d986c5
feat: use SecretStr for inference provider auth credentials (#3724)
# What does this PR do?

use SecretStr for OpenAIMixin providers

- RemoteInferenceProviderConfig now has auth_credential: SecretStr
- the default alias is api_key (most common name)
- some providers override to use api_token (RunPod, vLLM, Databricks)
- some providers exclude it (Ollama, TGI, Vertex AI)

addresses #3517 

## Test Plan

ci w/ new tests
2025-10-10 07:32:50 -07:00