Merge branch 'main' into content-extension

This commit is contained in:
Francisco Arceo 2025-08-28 12:58:13 -06:00 committed by GitHub
commit 4c1f187c71
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
42 changed files with 2089 additions and 389 deletions

View file

@ -33,7 +33,7 @@ The list of open-benchmarks we currently support:
- [MMMU](https://arxiv.org/abs/2311.16502) (A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI)]: Benchmark designed to evaluate multimodal models.
You can follow this [contributing guide](https://llama-stack.readthedocs.io/en/latest/references/evals_reference/index.html#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
You can follow this [contributing guide](../references/evals_reference/index.md#open-benchmark-contributing-guide) to add more open-benchmarks to Llama Stack
#### Run evaluation on open-benchmarks via CLI

View file

@ -35,3 +35,6 @@ device: cpu
```
[Find more detailed information here!](huggingface.md)

View file

@ -22,3 +22,4 @@ checkpoint_format: meta
```
[Find more detailed information here!](torchtune.md)

View file

@ -88,7 +88,7 @@ Interactive pages for users to play with and explore Llama Stack API capabilitie
- **API Resources**: Inspect Llama Stack API resources
- This page allows you to inspect Llama Stack API resources (`models`, `datasets`, `memory_banks`, `benchmarks`, `shields`).
- Under the hood, it uses Llama Stack's `/<resources>/list` API to get information about each resources.
- Please visit [Core Concepts](https://llama-stack.readthedocs.io/en/latest/concepts/index.html) for more details about the resources.
- Please visit [Core Concepts](../../concepts/index.md) for more details about the resources.
### Starting the Llama Stack Playground

View file

@ -3,7 +3,7 @@
Llama Stack (LLS) provides two different APIs for building AI applications with tool calling capabilities: the **Agents API** and the **OpenAI Responses API**. While both enable AI systems to use tools, and maintain full conversation history, they serve different use cases and have distinct characteristics.
```{note}
For simple and basic inferencing, you may want to use the [Chat Completions API](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions) directly, before progressing to Agents or Responses API.
**Note:** For simple and basic inferencing, you may want to use the [Chat Completions API](../providers/openai.md#chat-completions) directly, before progressing to Agents or Responses API.
```
## Overview
@ -173,7 +173,7 @@ Both APIs demonstrate distinct strengths that make them valuable on their own fo
## For More Information
- **LLS Agents API**: For detailed information on creating and managing agents, see the [Agents documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent.html)
- **LLS Agents API**: For detailed information on creating and managing agents, see the [Agents documentation](agent.md)
- **OpenAI Responses API**: For information on using the OpenAI-compatible responses API, see the [OpenAI API documentation](https://platform.openai.com/docs/api-reference/responses)
- **Chat Completions API**: For the default backend API used by Agents, see the [Chat Completions providers documentation](https://llama-stack.readthedocs.io/en/latest/providers/index.html#chat-completions)
- **Agent Execution Loop**: For understanding how agents process turns and steps in their execution, see the [Agent Execution Loop documentation](https://llama-stack.readthedocs.io/en/latest/building_applications/agent_execution_loop.html)
- **Chat Completions API**: For the default backend API used by Agents, see the [Chat Completions providers documentation](../providers/openai.md#chat-completions)
- **Agent Execution Loop**: For understanding how agents process turns and steps in their execution, see the [Agent Execution Loop documentation](agent_execution_loop.md)

View file

@ -6,4 +6,4 @@ While there is a lot of flexibility to mix-and-match providers, often users will
**Locally Hosted Distro**: You may want to run Llama Stack on your own hardware. Typically though, you still need to use Inference via an external service. You can use providers like HuggingFace TGI, Fireworks, Together, etc. for this purpose. Or you may have access to GPUs and can run a [vLLM](https://github.com/vllm-project/vllm) or [NVIDIA NIM](https://build.nvidia.com/nim?filters=nimType%3Anim_type_run_anywhere&q=llama) instance. If you "just" have a regular desktop machine, you can use [Ollama](https://ollama.com/) for inference. To provide convenient quick access to these options, we provide a number of such pre-configured locally-hosted Distros.
**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/ios_sdk.html) and [Android](https://llama-stack.readthedocs.io/en/latest/distributions/ondevice_distro/android_sdk.html)
**On-device Distro**: To run Llama Stack directly on an edge device (mobile phone or a tablet), we provide Distros for [iOS](../distributions/ondevice_distro/ios_sdk.md) and [Android](../distributions/ondevice_distro/android_sdk.md)

View file

@ -14,6 +14,13 @@ Here are some example PRs to help you get started:
- [Nvidia Inference Implementation](https://github.com/meta-llama/llama-stack/pull/355)
- [Model context protocol Tool Runtime](https://github.com/meta-llama/llama-stack/pull/665)
## Guidelines for creating Internal or External Providers
|**Type** |Internal (In-tree) |External (out-of-tree)
|---------|-------------------|---------------------|
|**Description** |A provider that is directly in the Llama Stack code|A provider that is outside of the Llama stack core codebase but is still accessible and usable by Llama Stack.
|**Benefits** |Ability to interact with the provider with minimal additional configurations or installations| Contributors do not have to add directly to the code to create providers accessible on Llama Stack. Keep provider-specific code separate from the core Llama Stack code.
## Inference Provider Patterns
When implementing Inference providers for OpenAI-compatible APIs, Llama Stack provides several mixin classes to simplify development and ensure consistent behavior across providers.

View file

@ -27,7 +27,7 @@ Then, you can access the APIs like `models` and `inference` on the client and ca
response = client.models.list()
```
If you've created a [custom distribution](https://llama-stack.readthedocs.io/en/latest/distributions/building_distro.html), you can also use the run.yaml configuration file directly:
If you've created a [custom distribution](building_distro.md), you can also use the run.yaml configuration file directly:
```python
client = LlamaStackAsLibraryClient(config_path)

View file

@ -22,17 +22,17 @@ else
fi
if [ -z "${GITHUB_CLIENT_ID:-}" ]; then
echo "ERROR: GITHUB_CLIENT_ID not set. You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
echo "ERROR: GITHUB_CLIENT_ID not set. You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
exit 1
fi
if [ -z "${GITHUB_CLIENT_SECRET:-}" ]; then
echo "ERROR: GITHUB_CLIENT_SECRET not set. You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
echo "ERROR: GITHUB_CLIENT_SECRET not set. You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
exit 1
fi
if [ -z "${LLAMA_STACK_UI_URL:-}" ]; then
echo "ERROR: LLAMA_STACK_UI_URL not set. Should be set to the external URL of the UI (excluding port). You need it for Github login to work. Refer to https://llama-stack.readthedocs.io/en/latest/deploying/index.html#kubernetes-deployment-guide"
echo "ERROR: LLAMA_STACK_UI_URL not set. Should be set to the external URL of the UI (excluding port). You need it for Github login to work. See the Kubernetes Deployment Guide in the Llama Stack documentation."
exit 1
fi

View file

@ -66,7 +66,7 @@ llama stack run starter --port 5050
Ensure the Llama Stack server version is the same as the Kotlin SDK Library for maximum compatibility.
Other inference providers: [Table](https://llama-stack.readthedocs.io/en/latest/index.html#supported-llama-stack-implementations)
Other inference providers: [Table](../../index.md#supported-llama-stack-implementations)
How to set remote localhost in Demo App: [Settings](https://github.com/meta-llama/llama-stack-client-kotlin/tree/latest-release/examples/android_app#settings)

View file

@ -2,7 +2,7 @@
orphan: true
---
<!-- This file was auto-generated by distro_codegen.py, please edit source -->
# Meta Reference Distribution
# Meta Reference GPU Distribution
```{toctree}
:maxdepth: 2
@ -41,7 +41,7 @@ The following environment variables can be configured:
## Prerequisite: Downloading Models
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
```
$ llama model list --downloaded

View file

@ -9,7 +9,6 @@ This section contains documentation for all available providers for the **post_t
```{toctree}
:maxdepth: 1
inline_huggingface-cpu
inline_huggingface-gpu
inline_torchtune-cpu
inline_torchtune-gpu

View file

@ -202,7 +202,7 @@ pprint(response)
Llama Stack offers a library of scoring functions and the `/scoring` API, allowing you to run evaluations on your pre-annotated AI application datasets.
In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](https://llama-stack.readthedocs.io/en/latest/playground/index.html) for an interactive interface to upload datasets and run scorings.
In this example, we will work with an example RAG dataset you have built previously, label with an annotation, and use LLM-As-Judge with custom judge prompt for scoring. Please checkout our [Llama Stack Playground](../../building_applications/playground/index.md) for an interactive interface to upload datasets and run scorings.
```python
judge_model_id = "meta-llama/Llama-3.1-405B-Instruct-FP8"

View file

@ -80,7 +80,7 @@ def get_provider_dependencies(
normal_deps = []
special_deps = []
for package in deps:
if "--no-deps" in package or "--index-url" in package:
if any(f in package for f in ["--no-deps", "--index-url", "--extra-index-url"]):
special_deps.append(package)
else:
normal_deps.append(package)

View file

@ -225,7 +225,10 @@ def replace_env_vars(config: Any, path: str = "") -> Any:
try:
result = re.sub(pattern, get_env_var, config)
return _convert_string_to_proper_type(result)
# Only apply type conversion if substitution actually happened
if result != config:
return _convert_string_to_proper_type(result)
return result
except EnvVarError as e:
raise EnvVarError(e.var_name, e.path) from None

View file

@ -34,7 +34,7 @@ distribution_spec:
telemetry:
- provider_type: inline::meta-reference
post_training:
- provider_type: inline::huggingface-cpu
- provider_type: inline::torchtune-cpu
eval:
- provider_type: inline::meta-reference
datasetio:

View file

@ -156,13 +156,10 @@ providers:
sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/ci-tests}/trace_store.db
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
post_training:
- provider_id: huggingface-cpu
provider_type: inline::huggingface-cpu
- provider_id: torchtune-cpu
provider_type: inline::torchtune-cpu
config:
checkpoint_format: huggingface
distributed_backend: null
device: cpu
dpo_output_dir: ~/.llama/distributions/ci-tests/dpo_output
checkpoint_format: meta
eval:
- provider_id: meta-reference
provider_type: inline::meta-reference

View file

@ -1,7 +1,7 @@
---
orphan: true
---
# Meta Reference Distribution
# Meta Reference GPU Distribution
```{toctree}
:maxdepth: 2
@ -29,7 +29,7 @@ The following environment variables can be configured:
## Prerequisite: Downloading Models
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/download_models.html) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
Please use `llama model list --downloaded` to check that you have llama model checkpoints downloaded in `~/.llama` before proceeding. See [installation guide](../../references/llama_cli_reference/download_models.md) here to download the models. Run `llama model list` to see the available models to download, and `llama model download` to download the checkpoints.
```
$ llama model list --downloaded

View file

@ -35,7 +35,7 @@ distribution_spec:
telemetry:
- provider_type: inline::meta-reference
post_training:
- provider_type: inline::torchtune-gpu
- provider_type: inline::huggingface-gpu
eval:
- provider_type: inline::meta-reference
datasetio:

View file

@ -156,10 +156,13 @@ providers:
sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter-gpu}/trace_store.db
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
post_training:
- provider_id: torchtune-gpu
provider_type: inline::torchtune-gpu
- provider_id: huggingface-gpu
provider_type: inline::huggingface-gpu
config:
checkpoint_format: meta
checkpoint_format: huggingface
distributed_backend: null
device: cpu
dpo_output_dir: ~/.llama/distributions/starter-gpu/dpo_output
eval:
- provider_id: meta-reference
provider_type: inline::meta-reference

View file

@ -17,6 +17,6 @@ def get_distribution_template() -> DistributionTemplate:
template.description = "Quick start template for running Llama Stack with several popular providers. This distribution is intended for GPU-enabled environments."
template.providers["post_training"] = [
BuildProvider(provider_type="inline::torchtune-gpu"),
BuildProvider(provider_type="inline::huggingface-gpu"),
]
return template

View file

@ -35,7 +35,7 @@ distribution_spec:
telemetry:
- provider_type: inline::meta-reference
post_training:
- provider_type: inline::huggingface-cpu
- provider_type: inline::torchtune-cpu
eval:
- provider_type: inline::meta-reference
datasetio:

View file

@ -156,13 +156,10 @@ providers:
sqlite_db_path: ${env.SQLITE_STORE_DIR:=~/.llama/distributions/starter}/trace_store.db
otel_exporter_otlp_endpoint: ${env.OTEL_EXPORTER_OTLP_ENDPOINT:=}
post_training:
- provider_id: huggingface-cpu
provider_type: inline::huggingface-cpu
- provider_id: torchtune-cpu
provider_type: inline::torchtune-cpu
config:
checkpoint_format: huggingface
distributed_backend: null
device: cpu
dpo_output_dir: ~/.llama/distributions/starter/dpo_output
checkpoint_format: meta
eval:
- provider_id: meta-reference
provider_type: inline::meta-reference

View file

@ -120,7 +120,7 @@ def get_distribution_template() -> DistributionTemplate:
],
"agents": [BuildProvider(provider_type="inline::meta-reference")],
"telemetry": [BuildProvider(provider_type="inline::meta-reference")],
"post_training": [BuildProvider(provider_type="inline::huggingface-cpu")],
"post_training": [BuildProvider(provider_type="inline::torchtune-cpu")],
"eval": [BuildProvider(provider_type="inline::meta-reference")],
"datasetio": [
BuildProvider(provider_type="remote::huggingface"),

View file

@ -40,8 +40,9 @@ def available_providers() -> list[ProviderSpec]:
InlineProviderSpec(
api=Api.inference,
provider_type="inline::sentence-transformers",
# CrossEncoder depends on torchao.quantization
pip_packages=[
"torch torchvision --index-url https://download.pytorch.org/whl/cpu",
"torch torchvision torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu",
"sentence-transformers --no-deps",
],
module="llama_stack.providers.inline.inference.sentence_transformers",

View file

@ -13,7 +13,7 @@ from llama_stack.providers.datatypes import AdapterSpec, Api, InlineProviderSpec
# The CPU version is used for distributions that don't have GPU support -- they result in smaller container images.
torchtune_def = dict(
api=Api.post_training,
pip_packages=["torchtune==0.5.0", "torchao==0.8.0", "numpy"],
pip_packages=["numpy"],
module="llama_stack.providers.inline.post_training.torchtune",
config_class="llama_stack.providers.inline.post_training.torchtune.TorchtunePostTrainingConfig",
api_dependencies=[
@ -23,56 +23,39 @@ torchtune_def = dict(
description="TorchTune-based post-training provider for fine-tuning and optimizing models using Meta's TorchTune framework.",
)
huggingface_def = dict(
api=Api.post_training,
pip_packages=["trl", "transformers", "peft", "datasets"],
module="llama_stack.providers.inline.post_training.huggingface",
config_class="llama_stack.providers.inline.post_training.huggingface.HuggingFacePostTrainingConfig",
api_dependencies=[
Api.datasetio,
Api.datasets,
],
description="HuggingFace-based post-training provider for fine-tuning models using the HuggingFace ecosystem.",
)
def available_providers() -> list[ProviderSpec]:
return [
InlineProviderSpec(
**{
**{ # type: ignore
**torchtune_def,
"provider_type": "inline::torchtune-cpu",
"pip_packages": (
cast(list[str], torchtune_def["pip_packages"])
+ ["torch torchtune==0.5.0 torchao==0.8.0 --index-url https://download.pytorch.org/whl/cpu"]
+ ["torch torchtune>=0.5.0 torchao>=0.12.0 --extra-index-url https://download.pytorch.org/whl/cpu"]
),
},
),
InlineProviderSpec(
**{
**huggingface_def,
"provider_type": "inline::huggingface-cpu",
"pip_packages": (
cast(list[str], huggingface_def["pip_packages"])
+ ["torch --index-url https://download.pytorch.org/whl/cpu"]
),
},
),
InlineProviderSpec(
**{
**{ # type: ignore
**torchtune_def,
"provider_type": "inline::torchtune-gpu",
"pip_packages": (
cast(list[str], torchtune_def["pip_packages"]) + ["torch torchtune==0.5.0 torchao==0.8.0"]
cast(list[str], torchtune_def["pip_packages"]) + ["torch torchtune>=0.5.0 torchao>=0.12.0"]
),
},
),
InlineProviderSpec(
**{
**huggingface_def,
"provider_type": "inline::huggingface-gpu",
"pip_packages": (cast(list[str], huggingface_def["pip_packages"]) + ["torch"]),
},
api=Api.post_training,
provider_type="inline::huggingface-gpu",
pip_packages=["trl", "transformers", "peft", "datasets", "torch"],
module="llama_stack.providers.inline.post_training.huggingface",
config_class="llama_stack.providers.inline.post_training.huggingface.HuggingFacePostTrainingConfig",
api_dependencies=[
Api.datasetio,
Api.datasets,
],
description="HuggingFace-based post-training provider for fine-tuning models using the HuggingFace ecosystem.",
),
remote_provider_spec(
api=Api.post_training,

View file

@ -9,7 +9,6 @@ from __future__ import annotations # for forward references
import hashlib
import json
import os
import sqlite3
from collections.abc import Generator
from contextlib import contextmanager
from enum import StrEnum
@ -125,28 +124,13 @@ class ResponseStorage:
def __init__(self, test_dir: Path):
self.test_dir = test_dir
self.responses_dir = self.test_dir / "responses"
self.db_path = self.test_dir / "index.sqlite"
self._ensure_directories()
self._init_database()
def _ensure_directories(self):
self.test_dir.mkdir(parents=True, exist_ok=True)
self.responses_dir.mkdir(exist_ok=True)
def _init_database(self):
with sqlite3.connect(self.db_path) as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS recordings (
request_hash TEXT PRIMARY KEY,
response_file TEXT,
endpoint TEXT,
model TEXT,
timestamp TEXT,
is_streaming BOOLEAN
)
""")
def store_recording(self, request_hash: str, request: dict[str, Any], response: dict[str, Any]):
"""Store a request/response pair."""
# Generate unique response filename
@ -169,34 +153,9 @@ class ResponseStorage:
f.write("\n")
f.flush()
# Update SQLite index
with sqlite3.connect(self.db_path) as conn:
conn.execute(
"""
INSERT OR REPLACE INTO recordings
(request_hash, response_file, endpoint, model, timestamp, is_streaming)
VALUES (?, ?, ?, ?, datetime('now'), ?)
""",
(
request_hash,
response_file,
request.get("endpoint", ""),
request.get("model", ""),
response.get("is_streaming", False),
),
)
def find_recording(self, request_hash: str) -> dict[str, Any] | None:
"""Find a recorded response by request hash."""
with sqlite3.connect(self.db_path) as conn:
result = conn.execute(
"SELECT response_file FROM recordings WHERE request_hash = ?", (request_hash,)
).fetchone()
if not result:
return None
response_file = result[0]
response_file = f"{request_hash[:12]}.json"
response_path = self.responses_dir / response_file
if not response_path.exists():

View file

@ -0,0 +1,610 @@
import { describe, test, expect } from "@jest/globals";
// Extract the exact processChunk function implementation for testing
function createProcessChunk() {
return (chunk: unknown): { text: string | null; isToolCall: boolean } => {
const chunkObj = chunk as Record<string, unknown>;
// Helper function to check if content contains function call JSON
const containsToolCall = (content: string): boolean => {
return (
content.includes('"type": "function"') ||
content.includes('"name": "knowledge_search"') ||
content.includes('"parameters":') ||
!!content.match(/\{"type":\s*"function".*?\}/)
);
};
// Check if this chunk contains a tool call (function call)
let isToolCall = false;
// Check direct chunk content if it's a string
if (typeof chunk === "string") {
isToolCall = containsToolCall(chunk);
}
// Check delta structures
if (
chunkObj?.delta &&
typeof chunkObj.delta === "object" &&
chunkObj.delta !== null
) {
const delta = chunkObj.delta as Record<string, unknown>;
if ("tool_calls" in delta) {
isToolCall = true;
}
if (typeof delta.text === "string") {
if (containsToolCall(delta.text)) {
isToolCall = true;
}
}
}
// Check event structures
if (
chunkObj?.event &&
typeof chunkObj.event === "object" &&
chunkObj.event !== null
) {
const event = chunkObj.event as Record<string, unknown>;
// Check event payload
if (
event?.payload &&
typeof event.payload === "object" &&
event.payload !== null
) {
const payload = event.payload as Record<string, unknown>;
if (typeof payload.content === "string") {
if (containsToolCall(payload.content)) {
isToolCall = true;
}
}
// Check payload delta
if (
payload?.delta &&
typeof payload.delta === "object" &&
payload.delta !== null
) {
const delta = payload.delta as Record<string, unknown>;
if (typeof delta.text === "string") {
if (containsToolCall(delta.text)) {
isToolCall = true;
}
}
}
}
// Check event delta
if (
event?.delta &&
typeof event.delta === "object" &&
event.delta !== null
) {
const delta = event.delta as Record<string, unknown>;
if (typeof delta.text === "string") {
if (containsToolCall(delta.text)) {
isToolCall = true;
}
}
if (typeof delta.content === "string") {
if (containsToolCall(delta.content)) {
isToolCall = true;
}
}
}
}
// if it's a tool call, skip it (don't display in chat)
if (isToolCall) {
return { text: null, isToolCall: true };
}
// Extract text content from various chunk formats
let text: string | null = null;
// Helper function to extract clean text content, filtering out function calls
const extractCleanText = (content: string): string | null => {
if (containsToolCall(content)) {
try {
// Try to parse and extract non-function call parts
const jsonMatch = content.match(
/\{"type":\s*"function"[^}]*\}[^}]*\}/
);
if (jsonMatch) {
const jsonPart = jsonMatch[0];
const parsedJson = JSON.parse(jsonPart);
// If it's a function call, extract text after JSON
if (parsedJson.type === "function") {
const textAfterJson = content
.substring(content.indexOf(jsonPart) + jsonPart.length)
.trim();
return textAfterJson || null;
}
}
// If we can't parse it properly, skip the whole thing
return null;
} catch {
return null;
}
}
return content;
};
// Try direct delta text
if (
chunkObj?.delta &&
typeof chunkObj.delta === "object" &&
chunkObj.delta !== null
) {
const delta = chunkObj.delta as Record<string, unknown>;
if (typeof delta.text === "string") {
text = extractCleanText(delta.text);
}
}
// Try event structures
if (
!text &&
chunkObj?.event &&
typeof chunkObj.event === "object" &&
chunkObj.event !== null
) {
const event = chunkObj.event as Record<string, unknown>;
// Try event payload content
if (
event?.payload &&
typeof event.payload === "object" &&
event.payload !== null
) {
const payload = event.payload as Record<string, unknown>;
// Try direct payload content
if (typeof payload.content === "string") {
text = extractCleanText(payload.content);
}
// Try turn_complete event structure: payload.turn.output_message.content
if (
!text &&
payload?.turn &&
typeof payload.turn === "object" &&
payload.turn !== null
) {
const turn = payload.turn as Record<string, unknown>;
if (
turn?.output_message &&
typeof turn.output_message === "object" &&
turn.output_message !== null
) {
const outputMessage = turn.output_message as Record<
string,
unknown
>;
if (typeof outputMessage.content === "string") {
text = extractCleanText(outputMessage.content);
}
}
// Fallback to model_response in steps if no output_message
if (
!text &&
turn?.steps &&
Array.isArray(turn.steps) &&
turn.steps.length > 0
) {
for (const step of turn.steps) {
if (step && typeof step === "object" && step !== null) {
const stepObj = step as Record<string, unknown>;
if (
stepObj?.model_response &&
typeof stepObj.model_response === "object" &&
stepObj.model_response !== null
) {
const modelResponse = stepObj.model_response as Record<
string,
unknown
>;
if (typeof modelResponse.content === "string") {
text = extractCleanText(modelResponse.content);
break;
}
}
}
}
}
}
// Try payload delta
if (
!text &&
payload?.delta &&
typeof payload.delta === "object" &&
payload.delta !== null
) {
const delta = payload.delta as Record<string, unknown>;
if (typeof delta.text === "string") {
text = extractCleanText(delta.text);
}
}
}
// Try event delta
if (
!text &&
event?.delta &&
typeof event.delta === "object" &&
event.delta !== null
) {
const delta = event.delta as Record<string, unknown>;
if (typeof delta.text === "string") {
text = extractCleanText(delta.text);
}
if (!text && typeof delta.content === "string") {
text = extractCleanText(delta.content);
}
}
}
// Try choices structure (ChatML format)
if (
!text &&
chunkObj?.choices &&
Array.isArray(chunkObj.choices) &&
chunkObj.choices.length > 0
) {
const choice = chunkObj.choices[0] as Record<string, unknown>;
if (
choice?.delta &&
typeof choice.delta === "object" &&
choice.delta !== null
) {
const delta = choice.delta as Record<string, unknown>;
if (typeof delta.content === "string") {
text = extractCleanText(delta.content);
}
}
}
// Try direct string content
if (!text && typeof chunk === "string") {
text = extractCleanText(chunk);
}
return { text, isToolCall: false };
};
}
describe("Chunk Processor", () => {
const processChunk = createProcessChunk();
describe("Real Event Structures", () => {
test("handles turn_complete event with cancellation policy response", () => {
const chunk = {
event: {
payload: {
event_type: "turn_complete",
turn: {
turn_id: "50a2d6b7-49ed-4d1e-b1c2-6d68b3f726db",
session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
input_messages: [
{
role: "user",
content: "nice, what's the cancellation policy?",
context: null,
},
],
steps: [
{
turn_id: "50a2d6b7-49ed-4d1e-b1c2-6d68b3f726db",
step_id: "54074310-af42-414c-9ffe-fba5b2ead0ad",
started_at: "2025-08-27T18:15:25.870703Z",
completed_at: "2025-08-27T18:15:51.288993Z",
step_type: "inference",
model_response: {
role: "assistant",
content:
"According to the search results, the cancellation policy for Red Hat Summit is as follows:\n\n* Cancellations must be received by 5 PM EDT on April 18, 2025 for a 50% refund of the registration fee.\n* No refunds will be given for cancellations received after 5 PM EDT on April 18, 2025.\n* Cancellation of travel reservations and hotel reservations are the responsibility of the registrant.",
stop_reason: "end_of_turn",
tool_calls: [],
},
},
],
output_message: {
role: "assistant",
content:
"According to the search results, the cancellation policy for Red Hat Summit is as follows:\n\n* Cancellations must be received by 5 PM EDT on April 18, 2025 for a 50% refund of the registration fee.\n* No refunds will be given for cancellations received after 5 PM EDT on April 18, 2025.\n* Cancellation of travel reservations and hotel reservations are the responsibility of the registrant.",
stop_reason: "end_of_turn",
tool_calls: [],
},
output_attachments: [],
started_at: "2025-08-27T18:15:25.868548Z",
completed_at: "2025-08-27T18:15:51.289262Z",
},
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toContain(
"According to the search results, the cancellation policy for Red Hat Summit is as follows:"
);
expect(result.text).toContain("5 PM EDT on April 18, 2025");
});
test("handles turn_complete event with address response", () => {
const chunk = {
event: {
payload: {
event_type: "turn_complete",
turn: {
turn_id: "2f4a1520-8ecc-4cb7-bb7b-886939e042b0",
session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
input_messages: [
{
role: "user",
content: "what's francisco's address",
context: null,
},
],
steps: [
{
turn_id: "2f4a1520-8ecc-4cb7-bb7b-886939e042b0",
step_id: "c13dd277-1acb-4419-8fbf-d5e2f45392ea",
started_at: "2025-08-27T18:14:52.558761Z",
completed_at: "2025-08-27T18:15:11.306032Z",
step_type: "inference",
model_response: {
role: "assistant",
content:
"Francisco Arceo's address is:\n\nRed Hat\nUnited States\n17 Primrose Ln \nBasking Ridge New Jersey 07920",
stop_reason: "end_of_turn",
tool_calls: [],
},
},
],
output_message: {
role: "assistant",
content:
"Francisco Arceo's address is:\n\nRed Hat\nUnited States\n17 Primrose Ln \nBasking Ridge New Jersey 07920",
stop_reason: "end_of_turn",
tool_calls: [],
},
output_attachments: [],
started_at: "2025-08-27T18:14:52.553707Z",
completed_at: "2025-08-27T18:15:11.306729Z",
},
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toContain("Francisco Arceo's address is:");
expect(result.text).toContain("17 Primrose Ln");
expect(result.text).toContain("Basking Ridge New Jersey 07920");
});
test("handles turn_complete event with ticket cost response", () => {
const chunk = {
event: {
payload: {
event_type: "turn_complete",
turn: {
turn_id: "7ef244a3-efee-42ca-a9c8-942865251002",
session_id: "e7f62b8e-518c-4450-82df-e65fe49f27a3",
input_messages: [
{
role: "user",
content: "what was the ticket cost for summit?",
context: null,
},
],
steps: [
{
turn_id: "7ef244a3-efee-42ca-a9c8-942865251002",
step_id: "7651dda0-315a-472d-b1c1-3c2725f55bc5",
started_at: "2025-08-27T18:14:21.710611Z",
completed_at: "2025-08-27T18:14:39.706452Z",
step_type: "inference",
model_response: {
role: "assistant",
content:
"The ticket cost for the Red Hat Summit was $999.00 for a conference pass.",
stop_reason: "end_of_turn",
tool_calls: [],
},
},
],
output_message: {
role: "assistant",
content:
"The ticket cost for the Red Hat Summit was $999.00 for a conference pass.",
stop_reason: "end_of_turn",
tool_calls: [],
},
output_attachments: [],
started_at: "2025-08-27T18:14:21.705289Z",
completed_at: "2025-08-27T18:14:39.706752Z",
},
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe(
"The ticket cost for the Red Hat Summit was $999.00 for a conference pass."
);
});
});
describe("Function Call Detection", () => {
test("detects function calls in direct string chunks", () => {
const chunk =
'{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}}';
const result = processChunk(chunk);
expect(result.isToolCall).toBe(true);
expect(result.text).toBe(null);
});
test("detects function calls in event payload content", () => {
const chunk = {
event: {
payload: {
content:
'{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}}',
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(true);
expect(result.text).toBe(null);
});
test("detects tool_calls in delta structure", () => {
const chunk = {
delta: {
tool_calls: [{ function: { name: "knowledge_search" } }],
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(true);
expect(result.text).toBe(null);
});
test("detects function call in mixed content but skips it", () => {
const chunk =
'{"type": "function", "name": "knowledge_search", "parameters": {"query": "test"}} Based on the search results, here is your answer.';
const result = processChunk(chunk);
// This is detected as a tool call and skipped entirely - the implementation prioritizes safety
expect(result.isToolCall).toBe(true);
expect(result.text).toBe(null);
});
});
describe("Text Extraction", () => {
test("extracts text from direct string chunks", () => {
const chunk = "Hello, this is a normal response.";
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe("Hello, this is a normal response.");
});
test("extracts text from delta structure", () => {
const chunk = {
delta: {
text: "Hello, this is a normal response.",
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe("Hello, this is a normal response.");
});
test("extracts text from choices structure", () => {
const chunk = {
choices: [
{
delta: {
content: "Hello, this is a normal response.",
},
},
],
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe("Hello, this is a normal response.");
});
test("prioritizes output_message over model_response in turn structure", () => {
const chunk = {
event: {
payload: {
turn: {
steps: [
{
model_response: {
content: "Model response content.",
},
},
],
output_message: {
content: "Final output message content.",
},
},
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe("Final output message content.");
});
test("falls back to model_response when no output_message", () => {
const chunk = {
event: {
payload: {
turn: {
steps: [
{
model_response: {
content: "This is from the model response.",
},
},
],
},
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe("This is from the model response.");
});
});
describe("Edge Cases", () => {
test("handles empty chunks", () => {
const result = processChunk("");
expect(result.isToolCall).toBe(false);
expect(result.text).toBe("");
});
test("handles null chunks", () => {
const result = processChunk(null);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe(null);
});
test("handles undefined chunks", () => {
const result = processChunk(undefined);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe(null);
});
test("handles chunks with no text content", () => {
const chunk = {
event: {
metadata: {
timestamp: "2024-01-01",
},
},
};
const result = processChunk(chunk);
expect(result.isToolCall).toBe(false);
expect(result.text).toBe(null);
});
test("handles malformed JSON in function calls gracefully", () => {
const chunk =
'{"type": "function", "name": "knowledge_search"} incomplete json';
const result = processChunk(chunk);
expect(result.isToolCall).toBe(true);
expect(result.text).toBe(null);
});
});
});

View file

@ -31,6 +31,9 @@ const mockClient = {
toolgroups: {
list: jest.fn(),
},
vectorDBs: {
list: jest.fn(),
},
};
jest.mock("@/hooks/use-auth-client", () => ({
@ -164,7 +167,7 @@ describe("ChatPlaygroundPage", () => {
session_name: "Test Session",
started_at: new Date().toISOString(),
turns: [],
}); // No turns by default
});
mockClient.agents.retrieve.mockResolvedValue({
agent_id: "test-agent",
agent_config: {
@ -417,7 +420,6 @@ describe("ChatPlaygroundPage", () => {
});
await waitFor(() => {
// first agent should be auto-selected
expect(mockClient.agents.session.create).toHaveBeenCalledWith(
"agent_123",
{ session_name: "Default Session" }
@ -464,7 +466,7 @@ describe("ChatPlaygroundPage", () => {
});
});
test("hides delete button when only one agent exists", async () => {
test("shows delete button even when only one agent exists", async () => {
mockClient.agents.list.mockResolvedValue({
data: [mockAgents[0]],
});
@ -474,9 +476,7 @@ describe("ChatPlaygroundPage", () => {
});
await waitFor(() => {
expect(
screen.queryByTitle("Delete current agent")
).not.toBeInTheDocument();
expect(screen.getByTitle("Delete current agent")).toBeInTheDocument();
});
});
@ -505,7 +505,7 @@ describe("ChatPlaygroundPage", () => {
await waitFor(() => {
expect(mockClient.agents.delete).toHaveBeenCalledWith("agent_123");
expect(global.confirm).toHaveBeenCalledWith(
"Are you sure you want to delete this agent? This action cannot be undone and will delete all associated sessions."
"Are you sure you want to delete this agent? This action cannot be undone and will delete the agent and all its sessions."
);
});
@ -584,4 +584,207 @@ describe("ChatPlaygroundPage", () => {
consoleSpy.mockRestore();
});
});
describe("RAG File Upload", () => {
let mockFileReader: {
readAsDataURL: jest.Mock;
readAsText: jest.Mock;
result: string | null;
onload: (() => void) | null;
onerror: (() => void) | null;
};
let mockRAGTool: {
insert: jest.Mock;
};
beforeEach(() => {
mockFileReader = {
readAsDataURL: jest.fn(),
readAsText: jest.fn(),
result: null,
onload: null,
onerror: null,
};
global.FileReader = jest.fn(() => mockFileReader);
mockRAGTool = {
insert: jest.fn().mockResolvedValue({}),
};
mockClient.toolRuntime = {
ragTool: mockRAGTool,
};
});
afterEach(() => {
jest.clearAllMocks();
});
test("handles text file upload", async () => {
new File(["Hello, world!"], "test.txt", {
type: "text/plain",
});
mockClient.agents.retrieve.mockResolvedValue({
agent_id: "test-agent",
agent_config: {
toolgroups: [
{
name: "builtin::rag/knowledge_search",
args: { vector_db_ids: ["test-vector-db"] },
},
],
},
});
await act(async () => {
render(<ChatPlaygroundPage />);
});
await waitFor(() => {
expect(screen.getByTestId("chat-component")).toBeInTheDocument();
});
const chatComponent = screen.getByTestId("chat-component");
chatComponent.getAttribute("data-onragfileupload");
// this is a simplified test
expect(mockRAGTool.insert).not.toHaveBeenCalled();
});
test("handles PDF file upload with FileReader", async () => {
new File([new ArrayBuffer(1000)], "test.pdf", {
type: "application/pdf",
});
const mockDataURL = "data:application/pdf;base64,JVBERi0xLjQK";
mockFileReader.result = mockDataURL;
mockClient.agents.retrieve.mockResolvedValue({
agent_id: "test-agent",
agent_config: {
toolgroups: [
{
name: "builtin::rag/knowledge_search",
args: { vector_db_ids: ["test-vector-db"] },
},
],
},
});
await act(async () => {
render(<ChatPlaygroundPage />);
});
await waitFor(() => {
expect(screen.getByTestId("chat-component")).toBeInTheDocument();
});
expect(global.FileReader).toBeDefined();
});
test("handles different file types correctly", () => {
const getContentType = (filename: string): string => {
const ext = filename.toLowerCase().split(".").pop();
switch (ext) {
case "pdf":
return "application/pdf";
case "txt":
return "text/plain";
case "md":
return "text/markdown";
case "html":
return "text/html";
case "csv":
return "text/csv";
case "json":
return "application/json";
case "docx":
return "application/vnd.openxmlformats-officedocument.wordprocessingml.document";
case "doc":
return "application/msword";
default:
return "application/octet-stream";
}
};
expect(getContentType("test.pdf")).toBe("application/pdf");
expect(getContentType("test.txt")).toBe("text/plain");
expect(getContentType("test.md")).toBe("text/markdown");
expect(getContentType("test.html")).toBe("text/html");
expect(getContentType("test.csv")).toBe("text/csv");
expect(getContentType("test.json")).toBe("application/json");
expect(getContentType("test.docx")).toBe(
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
);
expect(getContentType("test.doc")).toBe("application/msword");
expect(getContentType("test.unknown")).toBe("application/octet-stream");
});
test("determines text vs binary file types correctly", () => {
const isTextFile = (mimeType: string): boolean => {
return (
mimeType.startsWith("text/") ||
mimeType === "application/json" ||
mimeType === "text/markdown" ||
mimeType === "text/html" ||
mimeType === "text/csv"
);
};
expect(isTextFile("text/plain")).toBe(true);
expect(isTextFile("text/markdown")).toBe(true);
expect(isTextFile("text/html")).toBe(true);
expect(isTextFile("text/csv")).toBe(true);
expect(isTextFile("application/json")).toBe(true);
expect(isTextFile("application/pdf")).toBe(false);
expect(isTextFile("application/msword")).toBe(false);
expect(
isTextFile(
"application/vnd.openxmlformats-officedocument.wordprocessingml.document"
)
).toBe(false);
expect(isTextFile("application/octet-stream")).toBe(false);
});
test("handles FileReader error gracefully", async () => {
const pdfFile = new File([new ArrayBuffer(1000)], "test.pdf", {
type: "application/pdf",
});
mockFileReader.onerror = jest.fn();
const mockError = new Error("FileReader failed");
const fileReaderPromise = new Promise<string>((resolve, reject) => {
const reader = new FileReader();
reader.onload = () => resolve(reader.result as string);
reader.onerror = () => reject(reader.error || mockError);
reader.readAsDataURL(pdfFile);
setTimeout(() => {
reader.onerror?.(new ProgressEvent("error"));
}, 0);
});
await expect(fileReaderPromise).rejects.toBeDefined();
});
test("handles large file upload with FileReader approach", () => {
// create a large file
const largeFile = new File(
[new ArrayBuffer(10 * 1024 * 1024)],
"large.pdf",
{
type: "application/pdf",
}
);
expect(largeFile.size).toBe(10 * 1024 * 1024); // 10MB
expect(global.FileReader).toBeDefined();
const reader = new FileReader();
expect(reader.readAsDataURL).toBeDefined();
});
});
});

File diff suppressed because it is too large Load diff

View file

@ -35,6 +35,7 @@ interface ChatPropsBase {
) => void;
setMessages?: (messages: Message[]) => void;
transcribeAudio?: (blob: Blob) => Promise<string>;
onRAGFileUpload?: (file: File) => Promise<void>;
}
interface ChatPropsWithoutSuggestions extends ChatPropsBase {
@ -62,6 +63,7 @@ export function Chat({
onRateResponse,
setMessages,
transcribeAudio,
onRAGFileUpload,
}: ChatProps) {
const lastMessage = messages.at(-1);
const isEmpty = messages.length === 0;
@ -226,16 +228,17 @@ export function Chat({
isPending={isGenerating || isTyping}
handleSubmit={handleSubmit}
>
{({ files, setFiles }) => (
{() => (
<MessageInput
value={input}
onChange={handleInputChange}
allowAttachments
files={files}
setFiles={setFiles}
allowAttachments={true}
files={null}
setFiles={() => {}}
stop={handleStop}
isGenerating={isGenerating}
transcribeAudio={transcribeAudio}
onRAGFileUpload={onRAGFileUpload}
/>
)}
</ChatForm>

View file

@ -14,6 +14,7 @@ import { Card } from "@/components/ui/card";
import { Trash2 } from "lucide-react";
import type { Message } from "@/components/chat-playground/chat-message";
import { useAuthClient } from "@/hooks/use-auth-client";
import { cleanMessageContent } from "@/lib/message-content-utils";
import type {
Session,
SessionCreateParams,
@ -219,10 +220,7 @@ export function Conversations({
messages.push({
id: `${turn.turn_id}-assistant-${messages.length}`,
role: "assistant",
content:
typeof turn.output_message.content === "string"
? turn.output_message.content
: JSON.stringify(turn.output_message.content),
content: cleanMessageContent(turn.output_message.content),
createdAt: new Date(
turn.completed_at || turn.started_at || Date.now()
),
@ -271,7 +269,7 @@ export function Conversations({
);
const deleteSession = async (sessionId: string) => {
if (sessions.length <= 1 || !selectedAgentId) {
if (!selectedAgentId) {
return;
}
@ -324,7 +322,6 @@ export function Conversations({
}
}, [currentSession]);
// Don't render if no agent is selected
if (!selectedAgentId) {
return null;
}
@ -357,7 +354,7 @@ export function Conversations({
+ New
</Button>
{currentSession && sessions.length > 1 && (
{currentSession && (
<Button
onClick={() => deleteSession(currentSession.id)}
variant="outline"

View file

@ -21,6 +21,7 @@ interface MessageInputBaseProps
isGenerating: boolean;
enableInterrupt?: boolean;
transcribeAudio?: (blob: Blob) => Promise<string>;
onRAGFileUpload?: (file: File) => Promise<void>;
}
interface MessageInputWithoutAttachmentProps extends MessageInputBaseProps {
@ -213,8 +214,13 @@ export function MessageInput({
className
)}
{...(props.allowAttachments
? omit(props, ["allowAttachments", "files", "setFiles"])
: omit(props, ["allowAttachments"]))}
? omit(props, [
"allowAttachments",
"files",
"setFiles",
"onRAGFileUpload",
])
: omit(props, ["allowAttachments", "onRAGFileUpload"]))}
/>
{props.allowAttachments && (
@ -254,11 +260,19 @@ export function MessageInput({
size="icon"
variant="outline"
className="h-8 w-8"
aria-label="Attach a file"
disabled={true}
aria-label="Upload file to RAG"
disabled={false}
onClick={async () => {
const files = await showFileUploadDialog();
addFiles(files);
const input = document.createElement("input");
input.type = "file";
input.accept = ".pdf,.txt,.md,.html,.csv,.json";
input.onchange = async e => {
const file = (e.target as HTMLInputElement).files?.[0];
if (file && props.onRAGFileUpload) {
await props.onRAGFileUpload(file);
}
};
input.click();
}}
>
<Paperclip className="h-4 w-4" />
@ -337,28 +351,6 @@ function FileUploadOverlay({ isDragging }: FileUploadOverlayProps) {
);
}
function showFileUploadDialog() {
const input = document.createElement("input");
input.type = "file";
input.multiple = true;
input.accept = "*/*";
input.click();
return new Promise<File[] | null>(resolve => {
input.onchange = e => {
const files = (e.currentTarget as HTMLInputElement).files;
if (files) {
resolve(Array.from(files));
return;
}
resolve(null);
};
});
}
function TranscribingOverlay() {
return (
<motion.div

View file

@ -0,0 +1,243 @@
"use client";
import { useState, useEffect } from "react";
import { Button } from "@/components/ui/button";
import { Input } from "@/components/ui/input";
import { Card } from "@/components/ui/card";
import {
Select,
SelectContent,
SelectItem,
SelectTrigger,
SelectValue,
} from "@/components/ui/select";
import { useAuthClient } from "@/hooks/use-auth-client";
import type { Model } from "llama-stack-client/resources/models";
interface VectorDBCreatorProps {
models: Model[];
onVectorDBCreated?: (vectorDbId: string) => void;
onCancel?: () => void;
}
interface VectorDBProvider {
api: string;
provider_id: string;
provider_type: string;
}
export function VectorDBCreator({
models,
onVectorDBCreated,
onCancel,
}: VectorDBCreatorProps) {
const [vectorDbName, setVectorDbName] = useState("");
const [selectedEmbeddingModel, setSelectedEmbeddingModel] = useState("");
const [selectedProvider, setSelectedProvider] = useState("faiss");
const [availableProviders, setAvailableProviders] = useState<
VectorDBProvider[]
>([]);
const [isCreating, setIsCreating] = useState(false);
const [isLoadingProviders, setIsLoadingProviders] = useState(false);
const [error, setError] = useState<string | null>(null);
const client = useAuthClient();
const embeddingModels = models.filter(
model => model.model_type === "embedding"
);
useEffect(() => {
const fetchProviders = async () => {
setIsLoadingProviders(true);
try {
const providersResponse = await client.providers.list();
const vectorIoProviders = providersResponse.filter(
(provider: VectorDBProvider) => provider.api === "vector_io"
);
setAvailableProviders(vectorIoProviders);
if (vectorIoProviders.length > 0) {
const faissProvider = vectorIoProviders.find(
(p: VectorDBProvider) => p.provider_id === "faiss"
);
setSelectedProvider(
faissProvider?.provider_id || vectorIoProviders[0].provider_id
);
}
} catch (err) {
console.error("Error fetching providers:", err);
setAvailableProviders([
{
api: "vector_io",
provider_id: "faiss",
provider_type: "inline::faiss",
},
]);
} finally {
setIsLoadingProviders(false);
}
};
fetchProviders();
}, [client]);
const handleCreate = async () => {
if (!vectorDbName.trim() || !selectedEmbeddingModel) {
setError("Please provide a name and select an embedding model");
return;
}
setIsCreating(true);
setError(null);
try {
const embeddingModel = embeddingModels.find(
m => m.identifier === selectedEmbeddingModel
);
if (!embeddingModel) {
throw new Error("Selected embedding model not found");
}
const embeddingDimension = embeddingModel.metadata
?.embedding_dimension as number;
if (!embeddingDimension) {
throw new Error("Embedding dimension not available for selected model");
}
const vectorDbId = vectorDbName.trim() || `vector_db_${Date.now()}`;
const response = await client.vectorDBs.register({
vector_db_id: vectorDbId,
embedding_model: selectedEmbeddingModel,
embedding_dimension: embeddingDimension,
provider_id: selectedProvider,
});
onVectorDBCreated?.(response.identifier || vectorDbId);
} catch (err) {
console.error("Error creating vector DB:", err);
setError(
err instanceof Error ? err.message : "Failed to create vector DB"
);
} finally {
setIsCreating(false);
}
};
return (
<Card className="p-6 space-y-4">
<h3 className="text-lg font-semibold">Create Vector Database</h3>
<div className="space-y-4">
<div>
<label className="text-sm font-medium block mb-2">
Vector DB Name
</label>
<Input
value={vectorDbName}
onChange={e => setVectorDbName(e.target.value)}
placeholder="My Vector Database"
/>
</div>
<div>
<label className="text-sm font-medium block mb-2">
Embedding Model
</label>
<Select
value={selectedEmbeddingModel}
onValueChange={setSelectedEmbeddingModel}
>
<SelectTrigger>
<SelectValue placeholder="Select Embedding Model" />
</SelectTrigger>
<SelectContent>
{embeddingModels.map(model => (
<SelectItem key={model.identifier} value={model.identifier}>
{model.identifier}
</SelectItem>
))}
</SelectContent>
</Select>
{selectedEmbeddingModel && (
<p className="text-xs text-muted-foreground mt-1">
Dimension:{" "}
{embeddingModels.find(
m => m.identifier === selectedEmbeddingModel
)?.metadata?.embedding_dimension || "Unknown"}
</p>
)}
</div>
<div>
<label className="text-sm font-medium block mb-2">
Vector Database Provider
</label>
<Select
value={selectedProvider}
onValueChange={setSelectedProvider}
disabled={isLoadingProviders}
>
<SelectTrigger>
<SelectValue
placeholder={
isLoadingProviders
? "Loading providers..."
: "Select Provider"
}
/>
</SelectTrigger>
<SelectContent>
{availableProviders.map(provider => (
<SelectItem
key={provider.provider_id}
value={provider.provider_id}
>
{provider.provider_id}
</SelectItem>
))}
</SelectContent>
</Select>
{selectedProvider && (
<p className="text-xs text-muted-foreground mt-1">
Selected provider: {selectedProvider}
</p>
)}
</div>
{error && (
<div className="text-destructive text-sm bg-destructive/10 p-2 rounded">
{error}
</div>
)}
<div className="flex gap-2 pt-2">
<Button
onClick={handleCreate}
disabled={
isCreating || !vectorDbName.trim() || !selectedEmbeddingModel
}
className="flex-1"
>
{isCreating ? "Creating..." : "Create Vector DB"}
</Button>
{onCancel && (
<Button variant="outline" onClick={onCancel} className="flex-1">
Cancel
</Button>
)}
</div>
</div>
<div className="text-xs text-muted-foreground bg-muted/50 p-3 rounded">
<strong>Note:</strong> This will create a new vector database that can
be used with RAG tools. After creation, you&apos;ll be able to upload
documents and use it for knowledge search in your agent conversations.
</div>
</Card>
);
}

View file

@ -0,0 +1,51 @@
// check if content contains function call JSON
export const containsToolCall = (content: string): boolean => {
return (
content.includes('"type": "function"') ||
content.includes('"name": "knowledge_search"') ||
content.includes('"parameters":') ||
!!content.match(/\{"type":\s*"function".*?\}/)
);
};
export const extractCleanText = (content: string): string | null => {
if (containsToolCall(content)) {
try {
// parse and extract non-function call parts
const jsonMatch = content.match(/\{"type":\s*"function"[^}]*\}[^}]*\}/);
if (jsonMatch) {
const jsonPart = jsonMatch[0];
const parsedJson = JSON.parse(jsonPart);
// if function call, extract text after JSON
if (parsedJson.type === "function") {
const textAfterJson = content
.substring(content.indexOf(jsonPart) + jsonPart.length)
.trim();
return textAfterJson || null;
}
}
return null;
} catch {
return null;
}
}
return content;
};
// removes function call JSON handling different content types
export const cleanMessageContent = (
content: string | unknown[] | unknown
): string => {
if (typeof content === "string") {
const cleaned = extractCleanText(content);
return cleaned || "";
} else if (Array.isArray(content)) {
return content
.filter((item: { type: string }) => item.type === "text")
.map((item: { text: string }) => item.text)
.join("");
} else {
return JSON.stringify(content);
}
};

View file

@ -18,7 +18,7 @@
"class-variance-authority": "^0.7.1",
"clsx": "^2.1.1",
"framer-motion": "^11.18.2",
"llama-stack-client": "^0.2.18",
"llama-stack-client": "^0.2.19",
"lucide-react": "^0.510.0",
"next": "15.3.3",
"next-auth": "^4.24.11",
@ -36,7 +36,7 @@
"@eslint/eslintrc": "^3",
"@tailwindcss/postcss": "^4",
"@testing-library/dom": "^10.4.1",
"@testing-library/jest-dom": "^6.6.3",
"@testing-library/jest-dom": "^6.8.0",
"@testing-library/react": "^16.3.0",
"@types/jest": "^29.5.14",
"@types/node": "^20",
@ -3597,18 +3597,17 @@
}
},
"node_modules/@testing-library/jest-dom": {
"version": "6.6.3",
"resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.6.3.tgz",
"integrity": "sha512-IteBhl4XqYNkM54f4ejhLRJiZNqcSCoXUOG2CPK7qbD322KjQozM4kHQOfkG2oln9b9HTYqs+Sae8vBATubxxA==",
"version": "6.8.0",
"resolved": "https://registry.npmjs.org/@testing-library/jest-dom/-/jest-dom-6.8.0.tgz",
"integrity": "sha512-WgXcWzVM6idy5JaftTVC8Vs83NKRmGJz4Hqs4oyOuO2J4r/y79vvKZsb+CaGyCSEbUPI6OsewfPd0G1A0/TUZQ==",
"dev": true,
"license": "MIT",
"dependencies": {
"@adobe/css-tools": "^4.4.0",
"aria-query": "^5.0.0",
"chalk": "^3.0.0",
"css.escape": "^1.5.1",
"dom-accessibility-api": "^0.6.3",
"lodash": "^4.17.21",
"picocolors": "^1.1.1",
"redent": "^3.0.0"
},
"engines": {
@ -3617,20 +3616,6 @@
"yarn": ">=1"
}
},
"node_modules/@testing-library/jest-dom/node_modules/chalk": {
"version": "3.0.0",
"resolved": "https://registry.npmjs.org/chalk/-/chalk-3.0.0.tgz",
"integrity": "sha512-4D3B6Wf41KOYRFdszmDqMCGq5VV/uMAB273JILmO+3jAlh8X4qDtdtgCR3fxtbLEMzSx22QdhnDcJvu2u1fVwg==",
"dev": true,
"license": "MIT",
"dependencies": {
"ansi-styles": "^4.1.0",
"supports-color": "^7.1.0"
},
"engines": {
"node": ">=8"
}
},
"node_modules/@testing-library/jest-dom/node_modules/dom-accessibility-api": {
"version": "0.6.3",
"resolved": "https://registry.npmjs.org/dom-accessibility-api/-/dom-accessibility-api-0.6.3.tgz",
@ -10021,9 +10006,9 @@
"license": "MIT"
},
"node_modules/llama-stack-client": {
"version": "0.2.18",
"resolved": "https://registry.npmjs.org/llama-stack-client/-/llama-stack-client-0.2.18.tgz",
"integrity": "sha512-k+xQOz/TIU0cINP4Aih8q6xs7f/6qs0fLDMXTTKQr5C0F1jtCjRiwsas7bTsDfpKfYhg/7Xy/wPw/uZgi6aIVg==",
"version": "0.2.19",
"resolved": "https://registry.npmjs.org/llama-stack-client/-/llama-stack-client-0.2.19.tgz",
"integrity": "sha512-sDuAhUdEGlERZ3jlMUzPXcQTgMv/pGbDrPX0ifbE5S+gr7Q+7ohuQYrIXe+hXgIipFjq+y4b2c5laZ76tmAyEA==",
"license": "MIT",
"dependencies": {
"@types/node": "^18.11.18",
@ -10066,13 +10051,6 @@
"url": "https://github.com/sponsors/sindresorhus"
}
},
"node_modules/lodash": {
"version": "4.17.21",
"resolved": "https://registry.npmjs.org/lodash/-/lodash-4.17.21.tgz",
"integrity": "sha512-v2kDEe57lecTulaDIuNTPy3Ry4gLGJ6Z1O3vE1krgXZNrsQ+LFTGHVxVjcXPs17LhbZVGedAJv8XZ1tvj5FvSg==",
"dev": true,
"license": "MIT"
},
"node_modules/lodash.merge": {
"version": "4.6.2",
"resolved": "https://registry.npmjs.org/lodash.merge/-/lodash.merge-4.6.2.tgz",

View file

@ -23,7 +23,7 @@
"class-variance-authority": "^0.7.1",
"clsx": "^2.1.1",
"framer-motion": "^11.18.2",
"llama-stack-client": "^0.2.18",
"llama-stack-client": "^0.2.19",
"lucide-react": "^0.510.0",
"next": "15.3.3",
"next-auth": "^4.24.11",
@ -41,7 +41,7 @@
"@eslint/eslintrc": "^3",
"@tailwindcss/postcss": "^4",
"@testing-library/dom": "^10.4.1",
"@testing-library/jest-dom": "^6.6.3",
"@testing-library/jest-dom": "^6.8.0",
"@testing-library/react": "^16.3.0",
"@types/jest": "^29.5.14",
"@types/node": "^20",

View file

@ -7,7 +7,7 @@ required-version = ">=0.7.0"
[project]
name = "llama_stack"
version = "0.2.18"
version = "0.2.19"
authors = [{ name = "Meta Llama", email = "llama-oss@meta.com" }]
description = "Llama Stack"
readme = "README.md"
@ -31,7 +31,7 @@ dependencies = [
"huggingface-hub>=0.34.0,<1.0",
"jinja2>=3.1.6",
"jsonschema",
"llama-stack-client>=0.2.18",
"llama-stack-client>=0.2.19",
"llama-api-client>=0.1.2",
"openai>=1.99.6,<1.100.0",
"prompt-toolkit",
@ -56,7 +56,7 @@ dependencies = [
ui = [
"streamlit",
"pandas",
"llama-stack-client>=0.2.18",
"llama-stack-client>=0.2.19",
"streamlit-option-menu",
]

View file

@ -4,7 +4,6 @@
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
import sqlite3
import tempfile
from pathlib import Path
from unittest.mock import patch
@ -133,7 +132,6 @@ class TestInferenceRecording:
# Test directory creation
assert storage.test_dir.exists()
assert storage.responses_dir.exists()
assert storage.db_path.exists()
# Test storing and retrieving a recording
request_hash = "test_hash_123"
@ -147,15 +145,6 @@ class TestInferenceRecording:
storage.store_recording(request_hash, request_data, response_data)
# Verify SQLite record
with sqlite3.connect(storage.db_path) as conn:
result = conn.execute("SELECT * FROM recordings WHERE request_hash = ?", (request_hash,)).fetchone()
assert result is not None
assert result[0] == request_hash # request_hash
assert result[2] == "/v1/chat/completions" # endpoint
assert result[3] == "llama3.2:3b" # model
# Verify file storage and retrieval
retrieved = storage.find_recording(request_hash)
assert retrieved is not None
@ -185,10 +174,7 @@ class TestInferenceRecording:
# Verify recording was stored
storage = ResponseStorage(temp_storage_dir)
with sqlite3.connect(storage.db_path) as conn:
recordings = conn.execute("SELECT COUNT(*) FROM recordings").fetchone()[0]
assert recordings == 1
assert storage.responses_dir.exists()
async def test_replay_mode(self, temp_storage_dir, real_openai_chat_response):
"""Test that replay mode returns stored responses without making real calls."""

View file

@ -88,3 +88,10 @@ def test_nested_structures(setup_env_vars):
}
expected = {"key1": "test_value", "key2": ["default", "conditional"], "key3": {"nested": None}}
assert replace_env_vars(data) == expected
def test_explicit_strings_preserved(setup_env_vars):
# Explicit strings that look like numbers/booleans should remain strings
data = {"port": "8080", "enabled": "true", "count": "123", "ratio": "3.14"}
expected = {"port": "8080", "enabled": "true", "count": "123", "ratio": "3.14"}
assert replace_env_vars(data) == expected

68
uv.lock generated
View file

@ -1128,6 +1128,9 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/4f/72/dcbc6dbf838549b7b0c2c18c1365d2580eb7456939e4b608c3ab213fce78/geventhttpclient-2.3.4-cp312-cp312-macosx_10_13_universal2.whl", hash = "sha256:9ac30c38d86d888b42bb2ab2738ab9881199609e9fa9a153eb0c66fc9188c6cb", size = 71984, upload-time = "2025-06-11T13:17:09.126Z" },
{ url = "https://files.pythonhosted.org/packages/4c/f9/74aa8c556364ad39b238919c954a0da01a6154ad5e85a1d1ab5f9f5ac186/geventhttpclient-2.3.4-cp312-cp312-macosx_10_13_x86_64.whl", hash = "sha256:4b802000a4fad80fa57e895009671d6e8af56777e3adf0d8aee0807e96188fd9", size = 52631, upload-time = "2025-06-11T13:17:10.061Z" },
{ url = "https://files.pythonhosted.org/packages/11/1a/bc4b70cba8b46be8b2c6ca5b8067c4f086f8c90915eb68086ab40ff6243d/geventhttpclient-2.3.4-cp312-cp312-macosx_11_0_arm64.whl", hash = "sha256:461e4d9f4caee481788ec95ac64e0a4a087c1964ddbfae9b6f2dc51715ba706c", size = 51991, upload-time = "2025-06-11T13:17:11.049Z" },
{ url = "https://files.pythonhosted.org/packages/03/3f/5ce6e003b3b24f7caf3207285831afd1a4f857ce98ac45e1fb7a6815bd58/geventhttpclient-2.3.4-cp312-cp312-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:b7e41687c74e8fbe6a665458bbaea0c5a75342a95e2583738364a73bcbf1671b", size = 114982, upload-time = "2025-08-24T12:16:50.76Z" },
{ url = "https://files.pythonhosted.org/packages/60/16/6f9dad141b7c6dd7ee831fbcd72dd02535c57bc1ec3c3282f07e72c31344/geventhttpclient-2.3.4-cp312-cp312-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:c3ea5da20f4023cf40207ce15f5f4028377ffffdba3adfb60b4c8f34925fce79", size = 115654, upload-time = "2025-08-24T12:16:52.072Z" },
{ url = "https://files.pythonhosted.org/packages/ba/52/9b516a2ff423d8bd64c319e1950a165ceebb552781c5a88c1e94e93e8713/geventhttpclient-2.3.4-cp312-cp312-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:91f19a8a6899c27867dbdace9500f337d3e891a610708e86078915f1d779bf53", size = 121672, upload-time = "2025-08-24T12:16:53.361Z" },
{ url = "https://files.pythonhosted.org/packages/b0/f5/8d0f1e998f6d933c251b51ef92d11f7eb5211e3cd579018973a2b455f7c5/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:41f2dcc0805551ea9d49f9392c3b9296505a89b9387417b148655d0d8251b36e", size = 119012, upload-time = "2025-06-11T13:17:11.956Z" },
{ url = "https://files.pythonhosted.org/packages/ea/0e/59e4ab506b3c19fc72e88ca344d150a9028a00c400b1099637100bec26fc/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:62f3a29bf242ecca6360d497304900683fd8f42cbf1de8d0546c871819251dad", size = 124565, upload-time = "2025-06-11T13:17:12.896Z" },
{ url = "https://files.pythonhosted.org/packages/39/5d/dcbd34dfcda0c016b4970bd583cb260cc5ebfc35b33d0ec9ccdb2293587a/geventhttpclient-2.3.4-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:8714a3f2c093aeda3ffdb14c03571d349cb3ed1b8b461d9f321890659f4a5dbf", size = 115573, upload-time = "2025-06-11T13:17:13.937Z" },
@ -1141,6 +1144,9 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ff/ad/132fddde6e2dca46d6a86316962437acd2bfaeb264db4e0fae83c529eb04/geventhttpclient-2.3.4-cp313-cp313-macosx_10_13_universal2.whl", hash = "sha256:be64c5583884c407fc748dedbcb083475d5b138afb23c6bc0836cbad228402cc", size = 71967, upload-time = "2025-06-11T13:17:22.121Z" },
{ url = "https://files.pythonhosted.org/packages/f4/34/5e77d9a31d93409a8519cf573843288565272ae5a016be9c9293f56c50a1/geventhttpclient-2.3.4-cp313-cp313-macosx_10_13_x86_64.whl", hash = "sha256:15b2567137734183efda18e4d6245b18772e648b6a25adea0eba8b3a8b0d17e8", size = 52632, upload-time = "2025-06-11T13:17:23.016Z" },
{ url = "https://files.pythonhosted.org/packages/47/d2/cf0dbc333304700e68cee9347f654b56e8b0f93a341b8b0d027ee96800d6/geventhttpclient-2.3.4-cp313-cp313-macosx_11_0_arm64.whl", hash = "sha256:a4bca1151b8cd207eef6d5cb3c720c562b2aa7293cf113a68874e235cfa19c31", size = 51980, upload-time = "2025-06-11T13:17:23.933Z" },
{ url = "https://files.pythonhosted.org/packages/27/6e/049e685fc43e2e966c83f24b3187f6a6736103f0fc51118140f4ca1793d4/geventhttpclient-2.3.4-cp313-cp313-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:8a681433e2f3d4b326d8b36b3e05b787b2c6dd2a5660a4a12527622278bf02ed", size = 114998, upload-time = "2025-08-24T12:16:54.72Z" },
{ url = "https://files.pythonhosted.org/packages/24/13/1d08cf0400bf0fe0bb21e70f3f5fab2130aecef962b4362b7a1eba3cd738/geventhttpclient-2.3.4-cp313-cp313-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:736aa8e9609e4da40aeff0dbc02fea69021a034f4ed1e99bf93fc2ca83027b64", size = 115690, upload-time = "2025-08-24T12:16:56.328Z" },
{ url = "https://files.pythonhosted.org/packages/fd/bc/15d22882983cac573859d274783c5b0a95881e553fc312e7b646be432668/geventhttpclient-2.3.4-cp313-cp313-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:9d477ae1f5d42e1ee6abbe520a2e9c7f369781c3b8ca111d1f5283c1453bc825", size = 121681, upload-time = "2025-08-24T12:16:58.344Z" },
{ url = "https://files.pythonhosted.org/packages/ec/5b/c0c30ccd9d06c603add3f2d6abd68bd98430ee9730dc5478815759cf07f7/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:9b50d9daded5d36193d67e2fc30e59752262fcbbdc86e8222c7df6b93af0346a", size = 118987, upload-time = "2025-06-11T13:17:24.97Z" },
{ url = "https://files.pythonhosted.org/packages/4f/56/095a46af86476372064128162eccbd2ba4a7721503759890d32ea701d5fd/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl", hash = "sha256:fe705e7656bc6982a463a4ed7f9b1db8c78c08323f1d45d0d1d77063efa0ce96", size = 124519, upload-time = "2025-06-11T13:17:25.933Z" },
{ url = "https://files.pythonhosted.org/packages/ae/12/7c9ba94b58f7954a83d33183152ce6bf5bda10c08ebe47d79a314cd33e29/geventhttpclient-2.3.4-cp313-cp313-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl", hash = "sha256:69668589359db4cbb9efa327dda5735d1e74145e6f0a9ffa50236d15cf904053", size = 115574, upload-time = "2025-06-11T13:17:27.331Z" },
@ -1151,6 +1157,24 @@ wheels = [
{ url = "https://files.pythonhosted.org/packages/ca/36/9065bb51f261950c42eddf8718e01a9ff344d8082e31317a8b6677be9bd6/geventhttpclient-2.3.4-cp313-cp313-musllinux_1_2_x86_64.whl", hash = "sha256:8d1d0db89c1c8f3282eac9a22fda2b4082e1ed62a2107f70e3f1de1872c7919f", size = 112245, upload-time = "2025-06-11T13:17:32.331Z" },
{ url = "https://files.pythonhosted.org/packages/21/7e/08a615bec095c288f997951e42e48b262d43c6081bef33cfbfad96ab9658/geventhttpclient-2.3.4-cp313-cp313-win32.whl", hash = "sha256:4e492b9ab880f98f8a9cc143b96ea72e860946eae8ad5fb2837cede2a8f45154", size = 48360, upload-time = "2025-06-11T13:17:33.349Z" },
{ url = "https://files.pythonhosted.org/packages/ec/19/ef3cb21e7e95b14cfcd21e3ba7fe3d696e171682dfa43ab8c0a727cac601/geventhttpclient-2.3.4-cp313-cp313-win_amd64.whl", hash = "sha256:72575c5b502bf26ececccb905e4e028bb922f542946be701923e726acf305eb6", size = 48956, upload-time = "2025-06-11T13:17:34.956Z" },
{ url = "https://files.pythonhosted.org/packages/06/45/c41697c7d0cae17075ba535fb901985c2873461a9012e536de679525e28d/geventhttpclient-2.3.4-cp314-cp314-macosx_10_13_universal2.whl", hash = "sha256:503db5dd0aa94d899c853b37e1853390c48c7035132f39a0bab44cbf95d29101", size = 71999, upload-time = "2025-08-24T12:17:00.419Z" },
{ url = "https://files.pythonhosted.org/packages/5d/f7/1d953cafecf8f1681691977d9da9b647d2e02996c2431fb9b718cfdd3013/geventhttpclient-2.3.4-cp314-cp314-macosx_10_13_x86_64.whl", hash = "sha256:389d3f83316220cfa2010f41401c140215a58ddba548222e7122b2161e25e391", size = 52656, upload-time = "2025-08-24T12:17:01.337Z" },
{ url = "https://files.pythonhosted.org/packages/5c/ca/4bd19040905e911dd8771a4ab74630eadc9ee9072b01ab504332dada2619/geventhttpclient-2.3.4-cp314-cp314-macosx_11_0_arm64.whl", hash = "sha256:20c65d404fa42c95f6682831465467dff317004e53602c01f01fbd5ba1e56628", size = 51978, upload-time = "2025-08-24T12:17:02.282Z" },
{ url = "https://files.pythonhosted.org/packages/11/01/c457257ee41236347dac027e63289fa3f92f164779458bd244b376122bf6/geventhttpclient-2.3.4-cp314-cp314-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:2574ee47ff6f379e9ef124e2355b23060b81629f1866013aa975ba35df0ed60b", size = 115033, upload-time = "2025-08-24T12:17:03.272Z" },
{ url = "https://files.pythonhosted.org/packages/cc/c1/ef3ddc24b402eb3caa19dacbcd08d7129302a53d9b9109c84af1ea74e31a/geventhttpclient-2.3.4-cp314-cp314-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:fecf1b735591fb21ea124a374c207104a491ad0d772709845a10d5faa07fa833", size = 115762, upload-time = "2025-08-24T12:17:04.288Z" },
{ url = "https://files.pythonhosted.org/packages/a9/97/8dca246262e9a1ebd639120151db00e34b7d10f60bdbca8481878b91801a/geventhttpclient-2.3.4-cp314-cp314-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:44e9ba810c28f9635e5c4c9cf98fc6470bad5a3620d8045d08693f7489493a3c", size = 121757, upload-time = "2025-08-24T12:17:05.273Z" },
{ url = "https://files.pythonhosted.org/packages/10/7b/41bff3cbdeff3d06d45df3c61fa39cd25e60fa9d21c709ec6aeb58e9b58f/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_aarch64.whl", hash = "sha256:501d5c69adecd5eaee3c22302006f6c16aa114139640873b72732aa17dab9ee7", size = 111747, upload-time = "2025-08-24T12:17:06.585Z" },
{ url = "https://files.pythonhosted.org/packages/64/e6/3732132fda94082ec8793e3ae0d4d7fff6c1cb8e358e9664d1589499f4b1/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_ppc64le.whl", hash = "sha256:709f557138fb84ed32703d42da68f786459dab77ff2c23524538f2e26878d154", size = 118487, upload-time = "2025-08-24T12:17:07.816Z" },
{ url = "https://files.pythonhosted.org/packages/93/29/d48d119dee6c42e066330860186df56a80d4e76d2821a6c706ead49006d7/geventhttpclient-2.3.4-cp314-cp314-musllinux_1_2_x86_64.whl", hash = "sha256:b8b86815a30e026c6677b89a5a21ba5fd7b69accf8f0e9b83bac123e4e9f3b31", size = 112198, upload-time = "2025-08-24T12:17:08.867Z" },
{ url = "https://files.pythonhosted.org/packages/56/48/556adff8de1bd3469b58394f441733bb3c76cb22c2600cf2ee753e73d47f/geventhttpclient-2.3.4-cp314-cp314t-macosx_10_13_universal2.whl", hash = "sha256:4371b1b1afc072ad2b0ff5a8929d73ffd86d582908d3e9e8d7911dc027b1b3a6", size = 72354, upload-time = "2025-08-24T12:17:10.671Z" },
{ url = "https://files.pythonhosted.org/packages/7c/77/f1b32a91350382978cde0ddfee4089b94e006eb0f3e7297196d9d5451217/geventhttpclient-2.3.4-cp314-cp314t-macosx_10_13_x86_64.whl", hash = "sha256:6409fcda1f40d66eab48afc218b4c41e45a95c173738d10c50bc69c7de4261b9", size = 52835, upload-time = "2025-08-24T12:17:12.164Z" },
{ url = "https://files.pythonhosted.org/packages/d3/06/124f95556e0d5b4c417ec01fc30d91a3e4fe4524a44d2f629a1b1a721984/geventhttpclient-2.3.4-cp314-cp314t-macosx_11_0_arm64.whl", hash = "sha256:142870c2efb6bd0a593dcd75b83defb58aeb72ceaec4c23186785790bd44a311", size = 52165, upload-time = "2025-08-24T12:17:13.465Z" },
{ url = "https://files.pythonhosted.org/packages/76/9c/0850256e4461b0a90f2cf5c8156ea8f97e93a826aa76d7be70c9c6d4ba0f/geventhttpclient-2.3.4-cp314-cp314t-manylinux1_x86_64.manylinux_2_28_x86_64.manylinux_2_5_x86_64.whl", hash = "sha256:3a74f7b926badb3b1d47ea987779cb83523a406e89203070b58b20cf95d6f535", size = 117929, upload-time = "2025-08-24T12:17:14.477Z" },
{ url = "https://files.pythonhosted.org/packages/ca/55/3b54d0c0859efac95ba2649aeb9079a3523cdd7e691549ead2862907dc7d/geventhttpclient-2.3.4-cp314-cp314t-manylinux2014_aarch64.manylinux_2_17_aarch64.manylinux_2_28_aarch64.whl", hash = "sha256:2a8cde016e5ea6eb289c039b6af8dcef6c3ee77f5d753e57b48fe2555cdeacca", size = 119584, upload-time = "2025-08-24T12:17:15.709Z" },
{ url = "https://files.pythonhosted.org/packages/84/df/84ce132a0eb2b6d4f86e68a828e3118419cb0411cae101e4bad256c3f321/geventhttpclient-2.3.4-cp314-cp314t-manylinux2014_ppc64le.manylinux_2_17_ppc64le.manylinux_2_28_ppc64le.whl", hash = "sha256:5aa16f2939a508667093b18e47919376f7db9a9acbe858343173c5a58e347869", size = 125388, upload-time = "2025-08-24T12:17:16.915Z" },
{ url = "https://files.pythonhosted.org/packages/e8/4f/8156b9f6e25e4f18a60149bd2925f56f1ed7a1f8d520acb5a803536adadd/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_aarch64.whl", hash = "sha256:ffe87eb7f1956357c2144a56814b5ffc927cbb8932f143a0351c78b93129ebbc", size = 115214, upload-time = "2025-08-24T12:17:17.945Z" },
{ url = "https://files.pythonhosted.org/packages/f6/5a/b01657605c16ac4555b70339628a33fc7ca41ace58da167637ef72ad0a8e/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_ppc64le.whl", hash = "sha256:5ee758e37215da9519cea53105b2a078d8bc0a32603eef2a1f9ab551e3767dee", size = 121862, upload-time = "2025-08-24T12:17:18.97Z" },
{ url = "https://files.pythonhosted.org/packages/84/ca/c4e36a9b1bcce9958d8886aa4f7b262c8e9a7c43a284f2d79abfc9ba715d/geventhttpclient-2.3.4-cp314-cp314t-musllinux_1_2_x86_64.whl", hash = "sha256:416cc70adb3d34759e782d2e120b4432752399b85ac9758932ecd12274a104c3", size = 114999, upload-time = "2025-08-24T12:17:19.978Z" },
]
[[package]]
@ -1743,7 +1767,7 @@ wheels = [
[[package]]
name = "llama-stack"
version = "0.2.18"
version = "0.2.19"
source = { editable = "." }
dependencies = [
{ name = "aiohttp" },
@ -1881,8 +1905,8 @@ requires-dist = [
{ name = "jinja2", specifier = ">=3.1.6" },
{ name = "jsonschema" },
{ name = "llama-api-client", specifier = ">=0.1.2" },
{ name = "llama-stack-client", specifier = ">=0.2.18" },
{ name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.18" },
{ name = "llama-stack-client", specifier = ">=0.2.19" },
{ name = "llama-stack-client", marker = "extra == 'ui'", specifier = ">=0.2.19" },
{ name = "openai", specifier = ">=1.99.6,<1.100.0" },
{ name = "opentelemetry-exporter-otlp-proto-http", specifier = ">=1.30.0" },
{ name = "opentelemetry-sdk", specifier = ">=1.30.0" },
@ -1989,7 +2013,7 @@ unit = [
[[package]]
name = "llama-stack-client"
version = "0.2.18"
version = "0.2.19"
source = { registry = "https://pypi.org/simple" }
dependencies = [
{ name = "anyio" },
@ -2008,9 +2032,9 @@ dependencies = [
{ name = "tqdm" },
{ name = "typing-extensions" },
]
sdist = { url = "https://files.pythonhosted.org/packages/69/da/5e5a745495f8a2b8ef24fc4d01fe9031aa2277c36447cb22192ec8c8cc1e/llama_stack_client-0.2.18.tar.gz", hash = "sha256:860c885c9e549445178ac55cc9422e6e2a91215ac7aff5aaccfb42f3ce07e79e", size = 277284, upload-time = "2025-08-19T22:12:09.106Z" }
sdist = { url = "https://files.pythonhosted.org/packages/14/e4/72683c10188ae93e97551ab6eeac725e46f13ec215618532505a7d91bf2b/llama_stack_client-0.2.19.tar.gz", hash = "sha256:6c857e528b83af7821120002ebe4d3db072fd9f7bf867a152a34c70fe606833f", size = 318325, upload-time = "2025-08-26T21:54:20.592Z" }
wheels = [
{ url = "https://files.pythonhosted.org/packages/0a/e4/e97f8fdd8a07aa1efc7f7e37b5657d84357b664bf70dd1885a437edc0699/llama_stack_client-0.2.18-py3-none-any.whl", hash = "sha256:90f827d5476f7fc15fd993f1863af6a6e72bd064646bf6a99435eb43a1327f70", size = 367586, upload-time = "2025-08-19T22:12:07.899Z" },
{ url = "https://files.pythonhosted.org/packages/51/51/c8dde9fae58193a539eac700502876d8edde8be354c2784ff7b707a47432/llama_stack_client-0.2.19-py3-none-any.whl", hash = "sha256:478565a54541ca03ca9f8fe2019f4136f93ab6afe9591bdd44bc6dde6ddddbd9", size = 369905, upload-time = "2025-08-26T21:54:18.929Z" },
]
[[package]]
@ -4713,9 +4737,9 @@ dependencies = [
{ name = "typing-extensions", marker = "sys_platform == 'darwin'" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl", hash = "sha256:a47b7986bee3f61ad217d8a8ce24605809ab425baf349f97de758815edd2ef54" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl", hash = "sha256:fbe2e149c5174ef90d29a5f84a554dfaf28e003cb4f61fa2c8c024c17ec7ca58" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl", hash = "sha256:057efd30a6778d2ee5e2374cd63a63f63311aa6f33321e627c655df60abdd390" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp312-none-macosx_11_0_arm64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-cp313t-macosx_14_0_arm64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0-cp313-none-macosx_11_0_arm64.whl" },
]
[[package]]
@ -4738,19 +4762,19 @@ dependencies = [
{ name = "typing-extensions", marker = "sys_platform != 'darwin'" },
]
wheels = [
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-linux_s390x.whl", hash = "sha256:0e34e276722ab7dd0dffa9e12fe2135a9b34a0e300c456ed7ad6430229404eb5" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_aarch64.whl", hash = "sha256:610f600c102386e581327d5efc18c0d6edecb9820b4140d26163354a99cd800d" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl", hash = "sha256:cb9a8ba8137ab24e36bf1742cb79a1294bd374db570f09fc15a5e1318160db4e" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_amd64.whl", hash = "sha256:2be20b2c05a0cce10430cc25f32b689259640d273232b2de357c35729132256d" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_arm64.whl", hash = "sha256:99fc421a5d234580e45957a7b02effbf3e1c884a5dd077afc85352c77bf41434" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl", hash = "sha256:8b5882276633cf91fe3d2d7246c743b94d44a7e660b27f1308007fdb1bb89f7d" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl", hash = "sha256:a5064b5e23772c8d164068cc7c12e01a75faf7b948ecd95a0d4007d7487e5f25" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl", hash = "sha256:8f81dedb4c6076ec325acc3b47525f9c550e5284a18eae1d9061c543f7b6e7de" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl", hash = "sha256:e1ee1b2346ade3ea90306dfbec7e8ff17bc220d344109d189ae09078333b0856" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl", hash = "sha256:64c187345509f2b1bb334feed4666e2c781ca381874bde589182f81247e61f88" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl", hash = "sha256:af81283ac671f434b1b25c95ba295f270e72db1fad48831eb5e4748ff9840041" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl", hash = "sha256:a9dbb6f64f63258bc811e2c0c99640a81e5af93c531ad96e95c5ec777ea46dab" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl", hash = "sha256:6d93a7165419bc4b2b907e859ccab0dea5deeab261448ae9a5ec5431f14c0e64" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-linux_s390x.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_aarch64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_amd64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp312-cp312-win_arm64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-linux_s390x.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_aarch64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_amd64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313-win_arm64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_aarch64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-manylinux_2_28_x86_64.whl" },
{ url = "https://download.pytorch.org/whl/cpu/torch-2.8.0%2Bcpu-cp313-cp313t-win_amd64.whl" },
]
[[package]]