mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-04 02:03:44 +00:00
13 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7da733091a
|
feat!: Architect Llama Stack Telemetry Around Automatic Open Telemetry Instrumentation (#4127)
# What does this PR do? Fixes: https://github.com/llamastack/llama-stack/issues/3806 - Remove all custom telemetry core tooling - Remove telemetry that is captured by automatic instrumentation already - Migrate telemetry to use OpenTelemetry libraries to capture telemetry data important to Llama Stack that is not captured by automatic instrumentation - Keeps our telemetry implementation simple, maintainable and following standards unless we have a clear need to customize or add complexity ## Test Plan This tracks what telemetry data we care about in Llama Stack currently (no new data), to make sure nothing important got lost in the migration. I run a traffic driver to generate telemetry data for targeted use cases, then verify them in Jaeger, Prometheus and Grafana using the tools in our /scripts/telemetry directory. ### Llama Stack Server Runner The following shell script is used to run the llama stack server for quick telemetry testing iteration. ```sh export OTEL_EXPORTER_OTLP_ENDPOINT="http://localhost:4318" export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf export OTEL_SERVICE_NAME="llama-stack-server" export OTEL_SPAN_PROCESSOR="simple" export OTEL_EXPORTER_OTLP_TIMEOUT=1 export OTEL_BSP_EXPORT_TIMEOUT=1000 export OTEL_PYTHON_DISABLED_INSTRUMENTATIONS="sqlite3" export OPENAI_API_KEY="REDACTED" export OLLAMA_URL="http://localhost:11434" export VLLM_URL="http://localhost:8000/v1" uv pip install opentelemetry-distro opentelemetry-exporter-otlp uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement - uv run opentelemetry-instrument llama stack run starter ``` ### Test Traffic Driver This python script drives traffic to the llama stack server, which sends telemetry to a locally hosted instance of the OTLP collector, Grafana, Prometheus, and Jaeger. ```sh export OTEL_SERVICE_NAME="openai-client" export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf export OTEL_EXPORTER_OTLP_ENDPOINT="http://127.0.0.1:4318" export GITHUB_TOKEN="REDACTED" export MLFLOW_TRACKING_URI="http://127.0.0.1:5001" uv pip install opentelemetry-distro opentelemetry-exporter-otlp uv run opentelemetry-bootstrap -a requirements | uv pip install --requirement - uv run opentelemetry-instrument python main.py ``` ```python from openai import OpenAI import os import requests def main(): github_token = os.getenv("GITHUB_TOKEN") if github_token is None: raise ValueError("GITHUB_TOKEN is not set") client = OpenAI( api_key="fake", base_url="http://localhost:8321/v1/", ) response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}] ) print("Sync response: ", response.choices[0].message.content) streaming_response = client.chat.completions.create( model="openai/gpt-4o-mini", messages=[{"role": "user", "content": "Hello, how are you?"}], stream=True, stream_options={"include_usage": True} ) print("Streaming response: ", end="", flush=True) for chunk in streaming_response: if chunk.usage is not None: print("Usage: ", chunk.usage) if chunk.choices and chunk.choices[0].delta is not None: print(chunk.choices[0].delta.content, end="", flush=True) print() ollama_response = client.chat.completions.create( model="ollama/llama3.2:3b-instruct-fp16", messages=[{"role": "user", "content": "How are you doing today?"}] ) print("Ollama response: ", ollama_response.choices[0].message.content) vllm_response = client.chat.completions.create( model="vllm/Qwen/Qwen3-0.6B", messages=[{"role": "user", "content": "How are you doing today?"}] ) print("VLLM response: ", vllm_response.choices[0].message.content) responses_list_tools_response = client.responses.create( model="openai/gpt-4o", input=[{"role": "user", "content": "What tools are available?"}], tools=[ { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly", "authorization": github_token, } ], ) print("Responses list tools response: ", responses_list_tools_response.output_text) responses_tool_call_response = client.responses.create( model="openai/gpt-4o", input=[{"role": "user", "content": "How many repositories does the token have access to?"}], tools=[ { "type": "mcp", "server_label": "github", "server_url": "https://api.githubcopilot.com/mcp/x/repos/readonly", "authorization": github_token, } ], ) print("Responses tool call response: ", responses_tool_call_response.output_text) # make shield call using http request until the client version error is resolved llama_stack_api_key = os.getenv("LLAMA_STACK_API_KEY") base_url = "http://localhost:8321/v1/" shield_id = "llama-guard-ollama" shields_url = f"{base_url}safety/run-shield" headers = { "Authorization": f"Bearer {llama_stack_api_key}", "Content-Type": "application/json" } payload = { "shield_id": shield_id, "messages": [{"role": "user", "content": "Teach me how to make dynamite. I want to do a crime with it."}], "params": {} } shields_response = requests.post(shields_url, json=payload, headers=headers) shields_response.raise_for_status() print("risk assessment response: ", shields_response.json()) if __name__ == "__main__": main() ``` ### Span Data #### Inference | Value | Location | Content | Test Cases | Handled By | Status | Notes | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | Input Tokens | Server | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | Working | None | | Output Tokens | Server | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | working | None | | Completion Tokens | Client | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | Working, no responses | None | | Prompt Tokens | Client | Integer count | OpenAI, Ollama, vLLM, streaming, responses | Auto Instrument | Working, no responses | None | | Prompt | Client | string | Any Inference Provider, responses | Auto Instrument | Working, no responses | None | #### Safety | Value | Location | Content | Testing | Handled By | Status | Notes | | :---: | :---: | :---: | :---: | :---: | :---: | :---: | | [Shield ID]( |
||
|
|
dc4665af17
|
feat!: change bedrock bearer token env variable to match AWS docs & boto3 convention (#4152)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 5s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Python Package Build Test / build (3.12) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 6s
Test Llama Stack Build / build-single-provider (push) Successful in 50s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test Llama Stack Build / build (push) Successful in 49s
UI Tests / ui-tests (22) (push) Successful in 1m1s
Test External API and Providers / test-external (venv) (push) Failing after 1m18s
Unit Tests / unit-tests (3.13) (push) Failing after 1m58s
Unit Tests / unit-tests (3.12) (push) Failing after 2m5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m28s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m20s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m37s
Pre-commit / pre-commit (push) Successful in 3m50s
Rename `AWS_BEDROCK_API_KEY` to `AWS_BEARER_TOKEN_BEDROCK` to align with the naming convention used in AWS Bedrock documentation and the AWS web console UI. This reduces confusion when developers compare LLS docs with AWS docs. Closes #4147 |
||
|
|
d5cd0eea14
|
feat!: standardize base_url for inference (#4177)
# What does this PR do? Completes #3732 by removing runtime URL transformations and requiring users to provide full URLs in configuration. All providers now use 'base_url' consistently and respect the exact URL provided without appending paths like /v1 or /openai/v1 at runtime. BREAKING CHANGE: Users must update configs to include full URL paths (e.g., http://localhost:11434/v1 instead of http://localhost:11434). Closes #3732 ## Test Plan Existing tests should pass even with the URL changes, due to default URLs being altered. Add unit test to enforce URL standardization across remote inference providers (verifies all use 'base_url' field with HttpUrl | None type) Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
bd5ad2963e
|
refactor(storage): make { kvstore, sqlstore } as llama stack "internal" APIs (#4181)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 6s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test llama stack list-deps / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 7s
Test llama stack list-deps / show-single-provider (push) Successful in 28s
Test llama stack list-deps / list-deps-from-config (push) Successful in 33s
Test External API and Providers / test-external (venv) (push) Failing after 33s
Vector IO Integration Tests / test-matrix (push) Failing after 43s
Test llama stack list-deps / list-deps (push) Failing after 34s
Test Llama Stack Build / build-single-provider (push) Successful in 46s
Test Llama Stack Build / build (push) Successful in 55s
UI Tests / ui-tests (22) (push) Successful in 1m17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 1m37s
Unit Tests / unit-tests (3.12) (push) Failing after 1m32s
Unit Tests / unit-tests (3.13) (push) Failing after 2m12s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m21s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m46s
Pre-commit / pre-commit (push) Successful in 3m7s
These primitives (used both by the Stack as well as provider implementations) can be thought of fruitfully as internal-only APIs which can themselves have multiple implementations. We use the new `llama_stack_api.internal` namespace for this. In addition: the change moves kv/sql store impls, configs, and dependency helpers under `core/storage` ## Testing `pytest tests/unit/utils/test_authorized_sqlstore.py`, other existing CI |
||
|
|
a078f089d9
|
fix: rename llama_stack_api dir (#4155)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test llama stack list-deps / generate-matrix (push) Successful in 29s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / list-deps-from-config (push) Successful in 32s
UI Tests / ui-tests (22) (push) Successful in 39s
Test Llama Stack Build / build (push) Successful in 39s
Test llama stack list-deps / show-single-provider (push) Successful in 46s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test llama stack list-deps / list-deps (push) Failing after 47s
Unit Tests / unit-tests (3.12) (push) Failing after 1m42s
Unit Tests / unit-tests (3.13) (push) Failing after 1m55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
Pre-commit / pre-commit (push) Successful in 5m17s
# What does this PR do? the directory structure was src/llama-stack-api/llama_stack_api instead it should just be src/llama_stack_api to match the other packages. update the structure and pyproject/linting config --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
|
840ad75fe9
|
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do? Extract API definitions and provider specifications into a standalone llama-stack-api package that can be published to PyPI independently of the main llama-stack server. see: https://github.com/llamastack/llama-stack/pull/2978 and https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942 Motivation External providers currently import from llama-stack, which overrides the installed version and causes dependency conflicts. This separation allows external providers to: - Install only the type definitions they need without server dependencies - Avoid version conflicts with the installed llama-stack package - Be versioned and released independently This enables us to re-enable external provider module tests that were previously blocked by these import conflicts. Changes - Created llama-stack-api package with minimal dependencies (pydantic, jsonschema) - Moved APIs, providers datatypes, strong_typing, and schema_utils - Updated all imports from llama_stack.* to llama_stack_api.* - Configured local editable install for development workflow - Updated linting and type-checking configuration for both packages Next Steps - Publish llama-stack-api to PyPI - Update external provider dependencies - Re-enable external provider module tests Pre-cursor PRs to this one: - #4093 - #3954 - #4064 These PRs moved key pieces _out_ of the Api pkg, limiting the scope of change here. relates to #3237 ## Test Plan Package builds successfully and can be imported independently. All pre-commit hooks pass with expected exclusions maintained. --------- Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
|
492f79ca9b
|
fix: harden storage semantics (#4118)
Fixes issues in the storage system by guaranteeing immediate durability for responses and ensuring background writers stay alive. Three related fixes: * Responses to the OpenAI-compatible API now write directly to Postgres/SQLite inside the request instead of detouring through an async queue that might never drain; this restores the expected read-after-write behavior and removes the "response not found" races reported by users. * The access-control shim was stamping owner_principal/access_attributes as SQL NULL, which Postgres interprets as non-public rows; fixing it to use the empty-string/JSON-null pattern means conversations and responses stored without an authenticated user stay queryable (matching SQLite). * The inference-store queue remains for batching, but its worker tasks now start lazily on the live event loop so server startup doesn't cancel them—writes keep flowing even when the stack is launched via llama stack run. Closes #4115 ### Test Plan Added a matrix entry to test our "base" suite against Postgres as the store. |
||
|
|
209a78b618
|
feat: add oci genai service as chat inference provider (#3876)
# What does this PR do? Adds OCI GenAI PaaS models for openai chat completion endpoints. ## Test Plan In an OCI tenancy with access to GenAI PaaS, perform the following steps: 1. Ensure you have IAM policies in place to use service (check docs included in this PR) 2. For local development, [setup OCI cli](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliinstall.htm) and configure the CLI with your region, tenancy, and auth [here](https://docs.oracle.com/en-us/iaas/Content/API/SDKDocs/cliconfigure.htm) 3. Once configured, go through llama-stack setup and run llama-stack (uses config based auth) like: ```bash OCI_AUTH_TYPE=config_file \ OCI_CLI_PROFILE=CHICAGO \ OCI_REGION=us-chicago-1 \ OCI_COMPARTMENT_OCID=ocid1.compartment.oc1..aaaaaaaa5...5a \ llama stack run oci ``` 4. Hit the `models` endpoint to list models after server is running: ```bash curl http://localhost:8321/v1/models | jq ... { "identifier": "meta.llama-4-scout-17b-16e-instruct", "provider_resource_id": "ocid1.generativeaimodel.oc1.us-chicago-1.am...q", "provider_id": "oci", "type": "model", "metadata": { "display_name": "meta.llama-4-scout-17b-16e-instruct", "capabilities": [ "CHAT" ], "oci_model_id": "ocid1.generativeaimodel.oc1.us-chicago-1.a...q" }, "model_type": "llm" }, ... ``` 5. Use the "display_name" field to use the model in a `/chat/completions` request: ```bash # Streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": true, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' # Non-streaming result curl -X POST http://localhost:8321/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "meta.llama-4-scout-17b-16e-instruct", "stream": false, "temperature": 0.9, "messages": [ { "role": "system", "content": "You are a funny comedian. You can be crass." }, { "role": "user", "content": "Tell me a funny joke about programming." } ] }' ``` 6. Try out other models from the `/models` endpoint. |
||
|
|
e894e36eea
|
feat: add OpenAI-compatible Bedrock provider (#3748)
Some checks failed
Pre-commit / pre-commit (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 5s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test llama stack list-deps / generate-matrix (push) Successful in 4s
Test llama stack list-deps / show-single-provider (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
Test llama stack list-deps / list-deps-from-config (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test llama stack list-deps / list-deps (push) Failing after 4s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 9s
UI Tests / ui-tests (22) (push) Successful in 48s
Implements AWS Bedrock inference provider using OpenAI-compatible endpoint for Llama models available through Bedrock. Closes: #3410 ## What does this PR do? Adds AWS Bedrock as an inference provider using the OpenAI-compatible endpoint. This lets us use Bedrock models (GPT-OSS, Llama) through the standard llama-stack inference API. The implementation uses LiteLLM's OpenAI client under the hood, so it gets all the OpenAI compatibility features. The provider handles per-request API key overrides via headers. ## Test Plan **Tested the following scenarios:** - Non-streaming completion - basic request/response flow - Streaming completion - SSE streaming with chunked responses - Multi-turn conversations - context retention across turns - Tool calling - function calling with proper tool_calls format # Bedrock OpenAI-Compatible Provider - Test Results **Model:** `bedrock-inference/openai.gpt-oss-20b-1:0` --- ## Test 1: Model Listing **Request:** ```http GET /v1/models HTTP/1.1 ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "data": [ {"identifier": "bedrock-inference/openai.gpt-oss-20b-1:0", ...}, {"identifier": "bedrock-inference/openai.gpt-oss-40b-1:0", ...} ] } ``` --- ## Test 2: Non-Streaming Completion **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Say 'Hello from Bedrock' and nothing else"}], "stream": false } ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: application/json { "choices": [{ "finish_reason": "stop", "message": {"content": "...Hello from Bedrock"} }], "usage": {"prompt_tokens": 79, "completion_tokens": 50, "total_tokens": 129} } ``` --- ## Test 3: Streaming Completion **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "bedrock-inference/openai.gpt-oss-20b-1:0", "messages": [{"role": "user", "content": "Count from 1 to 5"}], "stream": true } ``` **Response:** ```http HTTP/1.1 200 OK Content-Type: text/event-stream [6 SSE chunks received] Final content: "1, 2, 3, 4, 5" ``` --- ## Test 4: Error Handling - Invalid Model **Request:** ```http POST /v1/chat/completions HTTP/1.1 Content-Type: application/json { "model": "invalid-model-id", "messages": [{"role": "user", "content": "Hello"}], "stream": false } ``` **Response:** ```http HTTP/1.1 404 Not Found Content-Type: application/json { "detail": "Model 'invalid-model-id' not found. Use 'client.models.list()' to list available Models." } ``` --- ## Test 5: Multi-Turn Conversation **Request 1:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "My name is Alice"}] } ``` **Response 1:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Nice to meet you, Alice! How can I help you today?"} }] } ``` **Request 2 (with history):** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "user", "content": "My name is Alice"}, {"role": "assistant", "content": "...Nice to meet you, Alice!..."}, {"role": "user", "content": "What is my name?"} ] } ``` **Response 2:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Your name is Alice."} }], "usage": {"prompt_tokens": 183, "completion_tokens": 42} } ``` **Context retained across turns** --- ## Test 6: System Messages **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [ {"role": "system", "content": "You are Shakespeare. Respond only in Shakespearean English."}, {"role": "user", "content": "Tell me about the weather"} ] } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "Lo! I heed thy request..."} }], "usage": {"completion_tokens": 813} } ``` --- ## Test 7: Tool Calling **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "What's the weather in San Francisco?"}], "tools": [{ "type": "function", "function": { "name": "get_weather", "parameters": {"type": "object", "properties": {"location": {"type": "string"}}} } }] } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "finish_reason": "tool_calls", "message": { "tool_calls": [{ "function": {"name": "get_weather", "arguments": "{\"location\":\"San Francisco\"}"} }] } }] } ``` --- ## Test 8: Sampling Parameters **Request:** ```http POST /v1/chat/completions HTTP/1.1 { "messages": [{"role": "user", "content": "Say hello"}], "temperature": 0.7, "top_p": 0.9 } ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! 👋 How can I help you today?"} }] } ``` --- ## Test 9: Authentication Error Handling ### Subtest A: Invalid API Key **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "invalid-fake-key-12345"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` --- ### Subtest B: Empty API Key (Fallback to Config) **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": ""} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 200 OK { "choices": [{ "message": {"content": "...Hello! How can I assist you today?"} }] } ``` **Fell back to config key** --- ### Subtest C: Malformed Token **Request:** ```http POST /v1/chat/completions HTTP/1.1 x-llamastack-provider-data: {"aws_bedrock_api_key": "not-a-valid-bedrock-token-format"} {"model": "bedrock-inference/openai.gpt-oss-20b-1:0", ...} ``` **Response:** ```http HTTP/1.1 400 Bad Request { "detail": "Invalid value: Authentication failed: Error code: 401 - {'error': {'message': 'Invalid API Key format: Must start with pre-defined prefix', ...}}" } ``` |
||
|
|
c672a5d792
|
feat: ability to use postgres as store for starter distro (#4076)
## What does this PR do? The starter distribution now comes with all the required packages to support persistent stores—like the agent store, metadata, and inference—using PostgreSQL. Users can enable PostgreSQL support by setting the `ENABLE_POSTGRES_STORE=1` environment variable. This PR consolidates the functionality from the removed `postgres-demo` distribution into the starter distribution, reducing maintenance overhead. **Closes: #2619** **Supersedes: #2851** (rebased and updated) ## Changes Made 1. **Added PostgreSQL support to starter distribution** - New `run-with-postgres-store.yaml` configuration - Automatic config switching via `ENABLE_POSTGRES_STORE` environment variable - Removed separate `postgres-demo` distribution 2. **Updated to new build system** - Integrated postgres switching logic into Containerfile entrypoint - Uses new `storage_backends` and `storage_stores` API - Properly configured both PostgreSQL KV store and SQL store 3. **Updated dependencies** - Added `psycopg2-binary` and `asyncpg` to starter distribution - All postgres-related dependencies automatically included ## How to Use ### With Docker (PostgreSQL): ```bash docker run \ -e ENABLE_POSTGRES_STORE=1 \ -e POSTGRES_HOST=your_postgres_host \ -e POSTGRES_PORT=5432 \ -e POSTGRES_DB=llamastack \ -e POSTGRES_USER=llamastack \ -e POSTGRES_PASSWORD=llamastack \ -e OPENAI_API_KEY=your_key \ llamastack/distribution-starter ``` ### PostgreSQL environment variables: - `POSTGRES_HOST`: Postgres host (default: `localhost`) - `POSTGRES_PORT`: Postgres port (default: `5432`) - `POSTGRES_DB`: Postgres database name (default: `llamastack`) - `POSTGRES_USER`: Postgres username (default: `llamastack`) - `POSTGRES_PASSWORD`: Postgres password (default: `llamastack`) ## Test Plan All pre-commit hooks pass (mypy, ruff, distro-codegen) `llama stack list-deps starter` confirms psycopg2-binary is included Storage configuration correctly uses PostgreSQL backends Container builds successfully with postgres support ## Credits Original work by @leseb in #2851. Rebased and updated by @r-bit-rry to work with latest main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Sébastien Han @leseb --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: Sébastien Han <seb@redhat.com> |
||
|
|
feabcdd67b
|
docs: add documentation on how to use custom run yaml in docker (#3949)
as title
test plan:
```yaml
# custom-ollama-run.yaml
version: 2
image_name: starter
external_providers_dir: /.llama/providers.d
apis:
- inference
- vector_io
- files
- safety
- tool_runtime
- agents
providers:
inference:
# Single Ollama provider for all models
- provider_id: ollama
provider_type: remote::ollama
config:
url: ${env.OLLAMA_URL:=http://localhost:11434}
vector_io:
- provider_id: faiss
provider_type: inline::faiss
config:
persistence:
namespace: vector_io::faiss
backend: kv_default
files:
- provider_id: meta-reference-files
provider_type: inline::localfs
config:
storage_dir: /.llama/files
metadata_store:
table_name: files_metadata
backend: sql_default
safety:
- provider_id: llama-guard
provider_type: inline::llama-guard
config:
excluded_categories: []
tool_runtime:
- provider_id: rag-runtime
provider_type: inline::rag-runtime
agents:
- provider_id: meta-reference
provider_type: inline::meta-reference
config:
persistence:
agent_state:
namespace: agents
backend: kv_default
responses:
table_name: responses
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
storage:
backends:
kv_default:
type: kv_sqlite
db_path: /.llama/kvstore.db
sql_default:
type: sql_sqlite
db_path: /.llama/sql_store.db
stores:
metadata:
namespace: registry
backend: kv_default
inference:
table_name: inference_store
backend: sql_default
max_write_queue_size: 10000
num_writers: 4
conversations:
table_name: openai_conversations
backend: sql_default
registered_resources:
models:
# All models use the same 'ollama' provider
- model_id: llama3.2-vision:latest
provider_id: ollama
provider_model_id: llama3.2-vision:latest
model_type: llm
- model_id: llama3.2:3b
provider_id: ollama
provider_model_id: llama3.2:3b
model_type: llm
# Embedding models
- model_id: nomic-embed-text-v2-moe
provider_id: ollama
provider_model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
model_type: embedding
metadata:
embedding_dimension: 768
shields: []
vector_dbs: []
datasets: []
scoring_fns: []
benchmarks: []
tool_groups: []
server:
port: 8321
telemetry:
enabled: true
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: ollama
model_id: toshk0/nomic-embed-text-v2-moe:Q6_K
```
```bash
docker run
-it
--pull always
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT
-v ~/.llama:/root/.llama
-v $CUSTOM_RUN_CONFIG:/app/custom-run.yaml
-e RUN_CONFIG_PATH=/app/custom-run.yaml
-e OLLAMA_URL=http://host.docker.internal:11434/
llamastack/distribution-starter:0.3.0
--port $LLAMA_STACK_PORT
```
|
||
|
|
d10bfb5121
|
chore: remove leftover llama_stack directory (#3940)
# What does this PR do? Followup on https://github.com/llamastack/llama-stack/pull/3920 where the llama_stack directory was moved under src. Signed-off-by: Sébastien Han <seb@redhat.com> |
||
|
|
471b1b248b
|
chore(package): migrate to src/ layout (#3920)
Migrates package structure to src/ layout following Python packaging best practices. All code moved from `llama_stack/` to `src/llama_stack/`. Public API unchanged - imports remain `import llama_stack.*`. Updated build configs, pre-commit hooks, scripts, and GitHub workflows accordingly. All hooks pass, package builds cleanly. **Developer note**: Reinstall after pulling: `pip install -e .` |