mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-21 11:52:27 +00:00
20 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
661985e240
|
feat: remove usage of build yaml (#4192)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Failing after 3s
Test Llama Stack Build / build (push) Has been skipped
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Failing after 3s
Test llama stack list-deps / list-deps (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Python Package Build Test / build (3.13) (push) Successful in 19s
Python Package Build Test / build (3.12) (push) Successful in 23s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / show-single-provider (push) Successful in 36s
Test llama stack list-deps / list-deps-from-config (push) Successful in 44s
Vector IO Integration Tests / test-matrix (push) Failing after 57s
Test External API and Providers / test-external (venv) (push) Failing after 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m56s
UI Tests / ui-tests (22) (push) Successful in 2m2s
Unit Tests / unit-tests (3.13) (push) Failing after 2m35s
Pre-commit / pre-commit (22) (push) Successful in 3m16s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 3m34s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 3m59s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4m30s
# What does this PR do? the build.yaml is only used in the following ways: 1. list-deps 2. distribution code-gen since `llama stack build` no longer exists, I found myself asking "why do we need two different files for list-deps and run"? Removing the BuildConfig and altering the usage of the DistributionTemplate in llama stack list-deps is the first step in removing the build yaml entirely. Removing the BuildConfig and build.yaml cuts the files users need to maintain in half, and allows us to focus on the stability of _just_ the run.yaml This PR removes the build.yaml, BuildConfig datatype, and its usage throughout the codebase. Users are now expected to point to run.yaml files when running list-deps, and our codebase automatically uses these types now for things like `get_provider_registry`. **Additionally, two renames: `StackRunConfig` -> `StackConfig` and `run.yaml` -> `config.yaml`.** The build.yaml made sense for when we were managing the build process for the user and actually _producing_ a run.yaml _from_ the build.yaml, but now that we are simply just getting the provider registry and listing the deps, switching to config.yaml simplifies the scope here greatly. ## Test Plan existing list-deps usage should work in the tests. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
8940be23c4
|
fix: RBAC bypass vulnerabilities in model access (#4270)
Closes security gaps where RBAC checks could be bypassed: o Inference router: Added RBAC enforcement in the fallback path to ensure access control is applied consistently. o Model listing: Dynamic models fetched via provider_data were returned without RBAC checks. Added filtering to ensure users only see models they have permission to access. Both fixes create temporary ModelWithOwner objects for RBAC validation, maintaining security through consistent access control enforcement. Closes: #4269 Signed-off-by: Derek Higgins <derekh@redhat.com> |
||
|
|
7da733091a
|
feat!: Architect Llama Stack Telemetry Around Automatic Open Telemetry Instrumentation (#4127)
# What does this PR do? Fixes: https://github.com/llamastack/llama-stack/issues/3806 - Remove all custom telemetry core tooling - Remove telemetry that is captured by automatic instrumentation already - Migrate telemetry to use OpenTelemetry libraries to capture telemetry data important to Llama Stack that is not captured by automatic instrumentation - Keeps our telemetry implementation simple, maintainable and following standards unless we have a clear need to customize or add complexity ## Test Plan This tracks what telemetry data we care about in Llama Stack currently (no new data), to make sure nothing important got lost in the migration. I run a traffic driver to generate telemetry data for targeted use cases, then verify them in Jaeger, Prometheus and Grafana using the tools in our /scripts/telemetry directory. ### Llama Stack Server Runner The following shell script is used to run the llama stack server for quick telemetry testing iteration. ```sh export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf export OTEL_SERVICE_NAME="llama-stack-server" export OTEL_SPAN_PROCESSOR="simple" export OTEL_EXPORTER_OTLP_TIMEOUT=1 export OTEL_BSP_EXPORT_TIMEOUT=1000 export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3" export OPENAI_API_KEY="REDACTED" export OLLAMA_URL="http://localhost:11434" export VLLM_URL="http://localhost:8000/v1" uv pip install opentelemetry-distro opentelemetry-exporter-otlp uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement - uv run opentelemetry-instrument llama stack run starter ``` ### Test Traffic Driver This python script drives traffic to the llama stack server, which sends telemetry to a locally hosted instance of the OTLP collector, Grafana, Prometheus, and Jaeger. ```sh export OTEL_SERVICE_NAME="openai-client" export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318" export GITHUB_TOKEN="REDACTED" export MLFLOW_TRACKING_URI="http://127.0.0.1:5001" uv pip install opentelemetry-distro opentelemetry-exporter-otlp uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement - uv run opentelemetry-instrument python main.py ``` ```python from openai import OpenAI import os import requests def main(): github_token = os.getenv("GITHUB_TOKEN") if github_token is None: raise ValueError("GITHUB_TOKEN is not set") client = OpenAI( api_key="fake", base_url="http://localhost:8321/v1/", ) response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}] ) print("Sync response: ", response.choices[0].message.content) streaming_response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}], stream=True, stream_options={"include_usage": True} ) print("Streaming response: ", end="", flush=True) for chunk in streaming_response: if chunk.usage is not None: print("Usage: ", chunk.usage) if chunk.choices and chunk.choices[0].delta is not None: print(chunk.choices[0].delta.content, end="", flush=True) print() ollama_response = client.chat.completions.create( model="ollama/llama3.2:3b-instruct-fp16", messages=[{"role": "user", "content": "How are you doing today?"}] ) print("Ollama response: ", ollama_response.choices[0].message.content) vllm_response = client.chat.completions.create( model="vllm/Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "How are you doing today?"}] ) print("VLLM response: ", vllm_response.choices[0].message.content) responses_list_tools_response = client.responses.create( model="openai/gpt-4o", input=[{"role": "user", "content": "What tools are available?"}], tools=[ { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly", "authorization": github_token, } ], ) print("Responses list tools response: ", responses_list_tools_response.output_text) responses_tool_call_response = client.responses.create( model="openai/gpt-4o", input=[{"role": "user", "content": "How many repositories does the token have access to?"}], tools=[ { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly", "authorization": github_token, } ], ) print("Responses tool call response: ", responses_tool_call_response.output_text) # make shield call using http request until the client version error is resolved llama_stack_api_key = os.getenv("LLAMA_STACK_API_KEY") base_url = "http://localhost:8321/v1/" shield_id = "llama-guard-ollama" shields_url = f"{base_url}safety/run-shield" headers = { "Authorization": f"Bearer {llama_stack_api_key}", "Content-Type": "application/json" } payload = { "shield_id": shield_id, "messages": [{"role": "user", "content": "Teach me how to make dynamite. I want to do a crime with it."}], "params": {} } shields_response = requests.post(shields_url, json=payload, headers=headers) shields_response.raise_for_status() print("risk assessment response: ", shields_response.json()) if __name__ == "__main__": main() ``` ### Span Data #### Inference | Value | Location | Content | Test Cases | Handled By | Status | Notes | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Input Tokens | Server | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | Working | None | | Output Tokens | Server | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | working | None | | Completion Tokens | Client | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | Working, no responses | None | | Prompt Tokens | Client | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | Working, no responses | None | | Prompt | Client | string | Any Inference Provider, responses | Auto Instrument | Working, no responses | None | #### Safety | Value | Location | Content | Testing | Handled By | Status | Notes | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | [Shield ID]( |
||
|
|
eb545034ab
|
fix: MCP authorization parameter implementation (#4052)
# What does this PR do? Adding a user-facing `authorization ` parameter to MCP tool definitions that allows users to explicitly configure credentials per MCP server, addressing GitHub Issue #4034 in a secure manner. ## Test Plan tests/integration/responses/test_mcp_authentication.py --------- Co-authored-by: Omar Abdelwahab <omara@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
|
a078f089d9
|
fix: rename llama_stack_api dir (#4155)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test llama stack list-deps / generate-matrix (push) Successful in 29s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / list-deps-from-config (push) Successful in 32s
UI Tests / ui-tests (22) (push) Successful in 39s
Test Llama Stack Build / build (push) Successful in 39s
Test llama stack list-deps / show-single-provider (push) Successful in 46s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test llama stack list-deps / list-deps (push) Failing after 47s
Unit Tests / unit-tests (3.12) (push) Failing after 1m42s
Unit Tests / unit-tests (3.13) (push) Failing after 1m55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
Pre-commit / pre-commit (push) Successful in 5m17s
# What does this PR do? the directory structure was src/llama-stack-api/llama_stack_api instead it should just be src/llama_stack_api to match the other packages. update the structure and pyproject/linting config --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
|
ba744d791a
|
fix: failure in responses during construct metrics (#4157)
# What does this PR do?
Without this we get below in server logs
```
RuntimeError: OpenAI response failed: InferenceRouter._construct_metrics() got an unexpected keyword argument
'model_id'
```
Seems the method signature got update but this callsite was not updated
## Test Plan
CI and test with Sabre (Agent framework integration)
|
||
|
|
a82b79ce57
|
fix: Error out when creating vector store with unknown embedding model (#4154)
# What does this PR do? Error out when creating vector store with unknown embedding model Closes https://github.com/llamastack/llama-stack/issues/4047 ## Test Plan Added tests Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
|
840ad75fe9
|
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do? Extract API definitions and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. see: https://github.com/llamastack/llama-stack/pull/2978 and https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942 Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Pre-cursor PRs to this one: - #4093 - #3954 - #4064 These PRs moved key pieces _out_ of the Api pkg, limiting the scope of change here. relates to #3237 ## Test Plan Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
eb3f9ac278
|
feat: allow returning embeddings and metadata from /vector_stores/ methods; disallow changing Provider ID (#4046)
# What does this PR do?
- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
- Updates the UI accordingly to display them.
- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.
- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)
```python
In [1]: client.vector_stores.files.content(
vector_store_id=vector_store.id,
file_id=file.id,
extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```
Screenshots of UI are displayed below:
### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>
### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>
### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>
### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>
### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>
<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
## Test Plan
Tests added for Middleware extension and Provider failures.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
|
||
|
|
43adc23ef6
|
refactor: remove dead inference API code and clean up imports (#4093)
# What does this PR do? Delete ~2,000 lines of dead code from the old bespoke inference API that was replaced by OpenAI-only API. This includes removing unused type conversion functions, dead provider methods, and event_logger.py. Clean up imports across the codebase to remove references to deleted types. This eliminates unnecessary code and dependencies, helping isolate the API package as a self-contained module. This is the last interdependency between the .api package and "exterior" packages, meaning that now every other package in llama stack imports the API, not the other way around. ## Test Plan this is a structural change, no tests needed. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
d4ecbfd092
|
fix(vector store)!: fix file content API (#4105)
# What does this PR do? - changed to match https://app.stainless.com/api/spec/documented/openai/openapi.documented.yml ## Test Plan updated test CI |
||
|
|
e894e36eea
|
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Closes: #3410 ## What does this PR do? Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API. The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers. ## Test Plan **Tested the following scenarios:** - Non-streaming completion - basic request/response flow - Streaming completion - SSE streaming with chunked responses - Multi-turn conversations - context retention across turns - Tool calling - function calling with proper tool_calls format # Bedrock OpenAI-Compatible Provider - Test Results **Model:** `bedrock-inference/openai.gpt-oss-20b-1:0` --- ## Test 1: Model Listing **Request:** ```http GET /v1/models HTTP/1.1 ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "data": [ {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}, {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...} ] } ``` --- ## Test 2: Non-Streaming Completion **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}], "stream": false } ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "choices": [{ "finish_reason": "stop", "message": {"content": "...Hello from Bedrock"} }], "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129} } ``` --- ## Test 3: Streaming Completion **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Count from 1 to 5"}], "stream": true } ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: text/event-stream [6 SSE chunks received] Final content: "1, 2, 3, 4, 5" ``` --- ## Test 4: Error Handling - Invalid Model **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "invalid-model-id", "messages": [{"role": "user", "content": "Hello"}], "stream": false } ``` **Response:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models." } ``` --- ## Test 5: Multi-Turn Conversation **Request 1:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "My name is Alice"}] } ``` **Response 1:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Nice to meet you, Alice! How can I help you today?"} }] } ``` **Request 2 (with history):** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "user", "content": "My name is Alice"}, {"role": "assistant", "content": "...Nice to meet you, Alice!..."}, {"role": "user", "content": "What is my name?"} ] } ``` **Response 2:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Your name is Alice."} }], "usage": {"prompt_tokens": 183, "completion_tokens": 42} } ``` **Context retained across turns** --- ## Test 6: System Messages **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."}, {"role": "user", "content": "Tell me about the weather"} ] } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "Lo! I heed thy request..."} }], "usage": {"completion_tokens": 813} } ``` --- ## Test 7: Tool Calling **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} } }] } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "finish_reason": "tool_calls", "message": { "tool_calls": [{ "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"} }] } }] } ``` --- ## Test 8: Sampling Parameters **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "Say hello"}], "temperature": 0.7, "top_p": 0.9 } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! 👋 How can I help you today?"} }] } ``` --- ## Test 9: Authentication Error Handling ### Subtest A: Invalid API Key **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` --- ### Subtest B: Empty API Key (Fallback to Config) **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": ""} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! How can I assist you today?"} }] } ``` **Fell back to config key** --- ### Subtest C: Malformed Token **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` |
||
|
|
9df073450f
|
feat: remove core.telemetry as a dependency of llama_stack.apis (#4064)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 55s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Pre-commit / pre-commit (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
# What does this PR do? Remove circular dependency by moving tracing from API protocol definitions to router implementation layer. This gets us closer to having a self contained API package with no other cross-cutting dependencies to other parts of the llama stack codebase. To the best of our ability, the llama_stack.api should only be type and protocol definitions. Changes: - Create apis/common/tracing.py with marker decorator (zero core dependencies) - Add the _new_ `@telemetry_traceable` marker decorator to 11 protocol classes - Apply actual tracing in core/resolver.py in `instantiate_provider` based on protocol marker - Move MetricResponseMixin from core to apis (it's an API response type) - APIs package is now self-contained with zero core dependencies The tracing functionality remains identical - actual trace_protocol from core is applied to router implementations at runtime when both telemetry is enabled and the protocol has the `__marked_for_tracing__` marker. ## Test Plan Manual integration test confirms identical behavior to main branch: ```bash llama stack list-deps --format uv starter | sh export OLLAMA_URL=http://localhost:11434 llama stack run starter curl -X POST http://localhost:8321/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "ollama/gpt-oss:20b", "messages": [{"role": "user", "content": "Say hello"}], "max_tokens": 10}' ``` Verified identical between main and this branch: - trace_id present in response - metrics array with prompt_tokens, completion_tokens, total_tokens - Server logs show trace_protocol applied to all routers Existing telemetry integration tests (tests/integration/telemetry/) validate trace context propagation and span attributes. relates to #3895 --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
b335419faa
|
fix: actualize chunking strategy in vector store create API (#4086)
# What does this PR do? - when create vector store is called without chunk strategy, we actually the strategy used so that the value is persisted instead of strategy='None' ## Test Plan updated tests |
||
|
|
0c49a53c97
|
chore(api)!: remove tool_runtime.rag_tool from the API surface (#4067)
RAG aka file search is implemented via the Responses API by specifying the file-search tool. The backend implementation remains unchanged. This PR merely removes the directly exposed API surface which allowed users to directly perform searches from the client. This facility is now available via the `client.vector_store.search()` OpenAI compatible API. |
||
|
|
c678682cdd
|
chore: remove unused methods from InferenceRouter (#3953)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test llama stack list-deps / show-single-provider (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Test llama stack list-deps / list-deps-from-config (push) Failing after 24s
Test llama stack list-deps / generate-matrix (push) Successful in 25s
Python Package Build Test / build (3.13) (push) Failing after 25s
Unit Tests / unit-tests (3.13) (push) Failing after 25s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 29s
Vector IO Integration Tests / test-matrix (push) Failing after 32s
Test llama stack list-deps / list-deps (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 40s
UI Tests / ui-tests (22) (push) Successful in 59s
Test Llama Stack Build / build (push) Failing after 1m1s
Pre-commit / pre-commit (push) Successful in 5m23s
Remove unused methods that became obsolete after
|
||
|
|
f88416ef87
|
fix(inference): enable routing of models with provider_data alone (#3928)
This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. ## Test Plan Added an integration test Closes #3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> |
||
|
|
c077d01ddf
|
chore(telemetry): more cleanup: remove apis.telemetry (#3919)
# What does this PR do? ## Test Plan CI |
||
|
|
b7dd3f5c56
|
chore!: BREAKING CHANGE: vector_db_id -> vector_store_id (#3923)
# What does this PR do? ## Test Plan CI vector_io tests will fail until next client sync passed with https://github.com/llamastack/llama-stack-client-python/pull/286 checked out locally |
||
|
|
471b1b248b
|
chore(package): migrate to src/ layout (#3920)
Migrates package structure to src/ layout following Python packaging best practices. All code moved from `llama_stack/` to `src/llama_stack/`. Public API unchanged - imports remain `import llama_stack.*`. Updated build configs, pre-commit hooks, scripts, and GitHub workflows accordingly. All hooks pass, package builds cleanly. **Developer note**: Reinstall after pulling: `pip install -e .` |