Commit graph

3537 commits

Author SHA1 Message Date
Eric Huang
00b1fefc67 merge commit for archive created by Sapling 2025-11-07 11:28:55 -08:00
Eric Huang
51714b4160 file content fix
# What does this PR do?


## Test Plan
2025-11-07 11:28:44 -08:00
ehhuang
bd2238126b
Merge 0d65bbc9e4 into sapling-pr-archive-ehhuang 2025-11-07 11:27:38 -08:00
Eric Huang
0d65bbc9e4 file content fix
# What does this PR do?


## Test Plan
2025-11-07 11:27:22 -08:00
ehhuang
02dbde7eec
Merge 8e457f1cec into sapling-pr-archive-ehhuang 2025-11-06 22:37:47 -08:00
Eric Huang
8e457f1cec chore(ui): add npm package and dockerfile
# What does this PR do?


## Test Plan
# What does this PR do?


## Test Plan
2025-11-06 22:37:27 -08:00
Eric Huang
856a2156df merge commit for archive created by Sapling 2025-11-06 22:37:03 -08:00
Eric Huang
72a476f920 chore(ui): add npm package and dockerfile
# What does this PR do?


## Test Plan
# What does this PR do?


## Test Plan
2025-11-06 22:36:57 -08:00
Eric Huang
d5a3847e76 merge commit for archive created by Sapling 2025-11-06 22:36:09 -08:00
Eric Huang
90ec92ea5d chore(ui): add npm package and dockerfile
# What does this PR do?


## Test Plan
# What does this PR do?


## Test Plan
2025-11-06 22:36:01 -08:00
ehhuang
95029bcb88
Merge afab6569f0 into sapling-pr-archive-ehhuang 2025-11-06 22:28:58 -08:00
Eric Huang
afab6569f0 chore(ui): add npm package and dockerfile
# What does this PR do?


## Test Plan
# What does this PR do?


## Test Plan
2025-11-06 22:28:52 -08:00
Sumanth Kamenani
e894e36eea
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible
endpoint for Llama models available through Bedrock.

Closes: #3410


## What does this PR do?

Adds AWS Bedrock as an inference provider using the OpenAI-compatible
endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the
standard llama-stack inference API.

The implementation uses LiteLLM's OpenAI client under the hood, so it
gets all the OpenAI compatibility features. The provider handles
per-request API key overrides via headers.

## Test Plan

**Tested the following scenarios:**
- Non-streaming completion - basic request/response flow
- Streaming completion - SSE streaming with chunked responses
- Multi-turn conversations - context retention across turns
- Tool calling - function calling with proper tool_calls format

# Bedrock OpenAI-Compatible Provider - Test Results


**Model:** `bedrock-inference/openai.gpt-oss-20b-1:0`


---

## Test 1: Model Listing

**Request:**
```http
GET /v1/models HTTP/1.1
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...},
    {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...}
  ]
}
```

---

## Test 2: Non-Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: application/json

{
  "choices": [{
    "finish_reason": "stop",
    "message": {"content": "...Hello from Bedrock"}
  }],
  "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129}
}
```

---

## Test 3: Streaming Completion

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "bedrock-inference/openai.gpt-oss-20b-1:0",
  "messages": [{"role": "user", "content": "Count from 1 to 5"}],
  "stream": true
}
```

**Response:**
```http
HTTP/1.1 200 OK
Content-Type: text/event-stream

[6 SSE chunks received]
Final content: "1, 2, 3, 4, 5"
```

---

## Test 4: Error Handling - Invalid Model

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
Content-Type: application/json

{
  "model": "invalid-model-id",
  "messages": [{"role": "user", "content": "Hello"}],
  "stream": false
}
```

**Response:**
```http
HTTP/1.1 404 Not Found
Content-Type: application/json

{
  "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models."
}
```

---

## Test 5: Multi-Turn Conversation

**Request 1:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "My name is Alice"}]
}
```

**Response 1:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Nice to meet you, Alice! How can I help you today?"}
  }]
}
```

**Request 2 (with history):**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "user", "content": "My name is Alice"},
    {"role": "assistant", "content": "...Nice to meet you, Alice!..."},
    {"role": "user", "content": "What is my name?"}
  ]
}
```

**Response 2:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Your name is Alice."}
  }],
  "usage": {"prompt_tokens": 183, "completion_tokens": 42}
}
```

**Context retained across turns**

---

## Test 6: System Messages

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [
    {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."},
    {"role": "user", "content": "Tell me about the weather"}
  ]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "Lo! I heed thy request..."}
  }],
  "usage": {"completion_tokens": 813}
}
```


---

## Test 7: Tool Calling

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}],
  "tools": [{
    "type": "function",
    "function": {
      "name": "get_weather",
      "parameters": {"type": "object", "properties": {"location": {"type": "string"}}}
    }
  }]
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "finish_reason": "tool_calls",
    "message": {
      "tool_calls": [{
        "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"}
      }]
    }
  }]
}
```

---

## Test 8: Sampling Parameters

**Request:**
```http
POST /v1/chat/completions HTTP/1.1

{
  "messages": [{"role": "user", "content": "Say hello"}],
  "temperature": 0.7,
  "top_p": 0.9
}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! 👋 How can I help you today?"}
  }]
}
```

---

## Test 9: Authentication Error Handling

### Subtest A: Invalid API Key

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```

---

### Subtest B: Empty API Key (Fallback to Config)

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": ""}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 200 OK

{
  "choices": [{
    "message": {"content": "...Hello! How can I assist you today?"}
  }]
}
```

 **Fell back to config key**

---

### Subtest C: Malformed Token

**Request:**
```http
POST /v1/chat/completions HTTP/1.1
x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"}

{"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}
```

**Response:**
```http
HTTP/1.1 400 Bad Request

{
  "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}"
}
```
2025-11-06 17:18:18 -08:00
Ashwin Bharambe
a2c4c12384
chore(ui): remove the Streamlit UI (#4097) 2025-11-06 15:51:57 -08:00
Sébastien Han
939a2db58f
chore: update stainless config (#4096)
# What does this PR do?

Removed in https://github.com/llamastack/llama-stack/pull/4067

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-11-06 15:58:13 -05:00
Charlie Doern
9df073450f
feat: remove core.telemetry as a dependency of llama_stack.apis (#4064)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 55s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
# What does this PR do?

Remove circular dependency by moving tracing from API protocol
definitions
 to router implementation layer.

This gets us closer to having a self contained API package with no other
cross-cutting dependencies to other parts of the llama stack codebase.
To the best of our ability, the llama_stack.api should only be type and
protocol definitions.

  Changes:
- Create apis/common/tracing.py with marker decorator (zero core
dependencies)
- Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol
classes
- Apply actual tracing in core/resolver.py in `instantiate_provider`
based on protocol marker
- Move MetricResponseMixin from core to apis (it's an API response type)
  - APIs package is now self-contained with zero core dependencies

The tracing functionality remains identical - actual trace_protocol from
core
is applied to router implementations at runtime when both telemetry is
enabled
  and the protocol has the `__marked_for_tracing__` marker.

  ## Test Plan

  Manual integration test confirms identical behavior to main branch:

  ```bash
  llama stack list-deps --format uv starter | sh
  export OLLAMA_URL=http://localhost:11434
  llama stack run starter

  curl -X POST http://localhost:8321/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model": "ollama/gpt-oss:20b",
         "messages": [{"role": "user", "content": "Say hello"}],
         "max_tokens": 10}'
         
```

  Verified identical between main and this branch:
  - trace_id present in response
  - metrics array with prompt_tokens, completion_tokens, total_tokens
  - Server logs show trace_protocol applied to all routers

  Existing telemetry integration tests (tests/integration/telemetry/) validate
  trace context propagation and span attributes.


relates to #3895

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-06 10:58:30 -08:00
Derek Higgins
dc9497a3b2
ci: Temperarily disable Telemetry during tests (#4090)
Closes: #4089

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 17:53:02 +01:00
Derek Higgins
03d23db910
ci: vllm ci job update (#4088)
Add missing recording for vllm in library mode
Add Docker env (missed during rebase)

Signed-off-by: Derek Higgins <derekh@redhat.com>
2025-11-06 16:59:55 +01:00
Derek Higgins
c62a09ab76
ci: Add vLLM support to integration testing infrastructure (with qwen) (#3545)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 14s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 22s
UI Tests / ui-tests (22) (push) Successful in 57s
o Introduces vLLM provider support to the record/replay testing
framework
o Enabling both recording and replay of vLLM API interactions alongside
existing Ollama support.

The changes enable testing of vLLM functionality. vLLM tests focus on
inference capabilities, while Ollama continues to exercise the full API
surface
including vision features.

--
This is an alternative to #3128 , using qwen3 instead of llama 3.2 1B
appears to be more capable at structure output and tool calls.

---------

Signed-off-by: Derek Higgins <derekh@redhat.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2025-11-06 10:36:40 +01:00
Ashwin Bharambe
bef1b044bd
refactor(passthrough): use AsyncOpenAI instead of AsyncLlamaStackClient (#4085)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Pre-commit / pre-commit (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 48s
We'd like to remove the dependence of `llama-stack` on
`llama-stack-client`. This is a necessary step.

A few small cleanups
- Enables `embeddings` now also
- Remove ModelRegistryHelper dependency (unused)
- Consolidate to auth_credential field via RemoteInferenceProviderConfig
- Implement list_models() to fetch from downstream /v1/models

## Test Plan

Tested using this script
https://gist.github.com/ashwinb/6356463d10f989c0682ab3bff8589581

Output:
```
Listing models from downstream server...
Available models: ['passthrough/ollama/nomic-embed-text:latest', 'passthrough/ollama/all-minilm:l6-v2', 'passthrough/ollama/llama3.2-vision:11b', 'passthrough/ollama/llama3.2-vision:latest', 'passthrough/ollama/llama-guard3:1b', 'passthrough/o
llama/llama3.2:1b', 'passthrough/ollama/all-minilm:latest', 'passthrough/ollama/llama3.2:3b', 'passthrough/ollama/llama3.2:3b-instruct-fp16', 'passthrough/bedrock/meta.llama3-1-8b-instruct-v1:0', 'passthrough/bedrock/meta.llama3-1-70b-instruct
-v1:0', 'passthrough/bedrock/meta.llama3-1-405b-instruct-v1:0', 'passthrough/sentence-transformers/nomic-ai/nomic-embed-text-v1.5']

Using LLM model: passthrough/ollama/llama3.2-vision:11b

Making inference request...

Response: 4.

--- Testing streaming ---
Streamed response: ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='1', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='', index=0, logprobs=None)], created=1762381674, m
odel='passthrough/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
...
5ChatCompletionChunk(id='chatcmpl-64', choices=[Choice(delta=ChoiceDelta(content='', reasoning_content=None, refusal=None, role='assistant', tool_calls=None), finish_reason='stop', index=0, logprobs=None)], created=1762381674, model='passthrou
gh/ollama/llama3.2-vision:11b', object='chat.completion.chunk', usage=None)
```
2025-11-05 18:15:11 -08:00
ehhuang
b335419faa
fix: actualize chunking strategy in vector store create API (#4086)
# What does this PR do?

- when create vector store is called without chunk strategy, we actually
the strategy used so that the value is persisted instead of
strategy='None'

## Test Plan
updated tests
2025-11-05 15:47:54 -08:00
Roy Belio
c672a5d792
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do?

The starter distribution now comes with all the required packages to
support persistent stores—like the agent store, metadata, and
inference—using PostgreSQL. Users can enable PostgreSQL support by
setting the `ENABLE_POSTGRES_STORE=1` environment variable.

This PR consolidates the functionality from the removed `postgres-demo`
distribution into the starter distribution, reducing maintenance
overhead.

**Closes: #2619**  
**Supersedes: #2851** (rebased and updated)

## Changes Made

1. **Added PostgreSQL support to starter distribution**
   - New `run-with-postgres-store.yaml` configuration
- Automatic config switching via `ENABLE_POSTGRES_STORE` environment
variable
   - Removed separate `postgres-demo` distribution

2. **Updated to new build system**
   - Integrated postgres switching logic into Containerfile entrypoint
   - Uses new `storage_backends` and `storage_stores` API
   - Properly configured both PostgreSQL KV store and SQL store

3. **Updated dependencies**
   - Added `psycopg2-binary` and `asyncpg` to starter distribution
   - All postgres-related dependencies automatically included

## How to Use

### With Docker (PostgreSQL):
```bash
docker run \
  -e ENABLE_POSTGRES_STORE=1 \
  -e POSTGRES_HOST=your_postgres_host \
  -e POSTGRES_PORT=5432 \
  -e POSTGRES_DB=llamastack \
  -e POSTGRES_USER=llamastack \
  -e POSTGRES_PASSWORD=llamastack \
  -e OPENAI_API_KEY=your_key \
  llamastack/distribution-starter
```

### PostgreSQL environment variables:
- `POSTGRES_HOST`: Postgres host (default: `localhost`)
- `POSTGRES_PORT`: Postgres port (default: `5432`)
- `POSTGRES_DB`: Postgres database name (default: `llamastack`)
- `POSTGRES_USER`: Postgres username (default: `llamastack`)
- `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`)

## Test Plan

All pre-commit hooks pass (mypy, ruff, distro-codegen)  
`llama stack list-deps starter` confirms psycopg2-binary is included  
Storage configuration correctly uses PostgreSQL backends  
Container builds successfully with postgres support  

## Credits

Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to
work with latest main.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Sébastien Han @leseb

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
2025-11-05 15:37:06 -08:00
ehhuang
a0471291f2
Merge acac63de5d into sapling-pr-archive-ehhuang 2025-11-05 15:08:13 -08:00
Eric Huang
acac63de5d fix: actualize chunking strategy in vector store create API
# What does this PR do?


## Test Plan
2025-11-05 15:08:09 -08:00
ehhuang
e9fd38df27
Merge e57c813a5e into sapling-pr-archive-ehhuang 2025-11-05 15:02:13 -08:00
Eric Huang
e57c813a5e fix: actualize chunking strategy in vector store create API
# What does this PR do?


## Test Plan
2025-11-05 15:02:02 -08:00
ehhuang
9d5c34af27
fix!: BREAKING CHANGE: vector_store: search API response fix (#4080)
# What does this PR do?
- search_query in the vector store search API should be a list,
according to https://github.com/openai/openai-openapi


## Test Plan
modified tests


---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/4080).
* #4086
* __->__ #4080
2025-11-05 15:01:48 -08:00
ehhuang
913d9717b6
Merge 6fb329d613 into sapling-pr-archive-ehhuang 2025-11-05 14:11:07 -08:00
Eric Huang
6fb329d613 fix: actualize chunking strategy in vector store create API
# What does this PR do?


## Test Plan
2025-11-05 14:10:55 -08:00
Eric Huang
2c44ffead3 fix: vector store APIs 2025-11-05 14:10:55 -08:00
ehhuang
84a84ee85c
fix: last_id when listing files in vector store (#4079)
# What does this PR do?
the last_id should be the id of the last item in the returned list, not
the unfiltered list.

## Test Plan
fixed test
2025-11-05 14:10:10 -08:00
ehhuang
6f6dca4b84
Merge ed9297be80 into sapling-pr-archive-ehhuang 2025-11-05 14:09:58 -08:00
Eric Huang
ed9297be80 fix: actualize chunking strategy in vector store create API
# What does this PR do?


## Test Plan
2025-11-05 14:09:47 -08:00
Eric Huang
042f987201 merge commit for archive created by Sapling 2025-11-05 13:56:53 -08:00
Eric Huang
353ac24258 fix: actualize chunking strategy in vector store create API
# What does this PR do?


## Test Plan
2025-11-05 13:56:47 -08:00
ehhuang
cbd02767b3
Merge 6097d93294 into sapling-pr-archive-ehhuang 2025-11-05 13:53:53 -08:00
ehhuang
c7ff931a4a
Merge 28cdf18083 into sapling-pr-archive-ehhuang 2025-11-05 13:53:20 -08:00
Eric Huang
6097d93294 fix: actualize chunking strategy in vector store create API
# What does this PR do?


## Test Plan
2025-11-05 13:52:55 -08:00
Eric Huang
28cdf18083 fix: vector store APIs 2025-11-05 13:52:44 -08:00
ehhuang
005e7ebd75
Merge de04d4e71a into sapling-pr-archive-ehhuang 2025-11-05 13:45:40 -08:00
Eric Huang
de04d4e71a fix: last_id when listing files in vector store
# What does this PR do?


## Test Plan
2025-11-05 13:45:25 -08:00
Ashwin Bharambe
d9cf5cd480
fix(ci): use --no-cache instead of --no-cache-dir (#4081)
This is necessary to make sure GPU dockers can be built on CI without
running out of space.
2025-11-05 12:14:02 -08:00
ehhuang
09ced9f299
Merge b403b602e4 into sapling-pr-archive-ehhuang 2025-11-05 11:35:05 -08:00
Eric Huang
b403b602e4 fix: vector_store: misc fixes
# What does this PR do?


## Test Plan
2025-11-05 11:34:59 -08:00
Charlie Doern
c899b50723
fix: print help for list-deps if no args (#4078)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test llama stack list-deps / generate-matrix (push) Successful in 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 5s
Pre-commit / pre-commit (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
API Conformance Tests / check-schema-compatibility (push) Successful in 16s
UI Tests / ui-tests (22) (push) Successful in 57s
# What does this PR do?

list-deps takes  positional args OR things like --providers

the issue with this, is that these args need to be optional since by
nature, one or the other can be specified.

add a check to list-deps that checks `if not args.providers and not
args.config`. If this is true, help is printed and we exit.

resolves #4075

## Test Plan
before:

```
╰─ llama stack list-deps
Traceback (most recent call last):
  File "/Users/charliedoern/projects/Documents/llama-stack/venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 52, in main
    parser.run(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/llama.py", line 43, in run
    args.func(args)
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/list_deps.py", line 51, in _run_stack_list_deps_command
    return run_stack_list_deps_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/charliedoern/projects/Documents/llama-stack/src/llama_stack/cli/stack/_list_deps.py", line 135, in run_stack_list_deps_command
    normal_deps, special_deps, external_provider_dependencies = get_provider_dependencies(build_config)
                                                                                          ^^^^^^^^^^^^
UnboundLocalError: cannot access local variable 'build_config' where it is not associated with a value

```

after:

```
╰─ llama stack list-deps
usage: llama stack list-deps [-h] [--providers PROVIDERS] [--format {uv,deps-only}] [config | distro]

list the dependencies for a llama stack distribution

positional arguments:
  config | distro       Path to config file to use or name of known distro (llama stack list for a list). (default: None)

options:
  -h, --help            show this help message and exit
  --providers PROVIDERS
                        sync dependencies for a list of providers and only those providers. This list is formatted like: api1=provider1,api2=provider2. Where there can be multiple
                        providers per API. (default: None)
  --format {uv,deps-only}
                        Output format: 'uv' shows shell commands, 'deps-only' shows just the list of dependencies without `uv` (default) (default: deps-only)
 ```

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-05 11:34:08 -08:00
Eric Huang
85b280e1f0 merge commit for archive created by Sapling 2025-11-05 11:30:16 -08:00
Eric Huang
737f83b3dd fix: vector_store: misc fixes
# What does this PR do?


## Test Plan
2025-11-05 11:30:09 -08:00
ehhuang
3b22c1ac45
Merge f97093d125 into sapling-pr-archive-ehhuang 2025-11-05 11:28:13 -08:00
Eric Huang
f97093d125 fix: vector_store: misc fixes
# What does this PR do?


## Test Plan
2025-11-05 11:28:07 -08:00
Eric Huang
2334cb1230 merge commit for archive created by Sapling 2025-11-05 11:27:25 -08:00