mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-12-03 09:53:45 +00:00
20 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
98a5047f9d
|
feat(prompts): attach prompts to storage stores in run configs (#3893)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for attaching prompts to storage stores in run configs. It allows to specify prompts as stores in different distributions. The need of this functionality was initiated in #3514 > Note, #3514 is divided on three separate PRs. Current PR is the first of three. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Manual testing and updated CI unit tests Prerequisites: 1. `uv run --with llama-stack llama stack list-deps starter | xargs -L1 uv pip install` 2. `llama stack run starter ` ``` INFO 2025-10-23 15:36:17,387 llama_stack.cli.stack.run:100 cli: Using run configuration: /Users/ianmiller/llama-stack/llama_stack/distributions/starter/run.yaml INFO 2025-10-23 15:36:17,423 llama_stack.cli.stack.run:157 cli: HTTPS enabled with certificates: Key: None Cert: None INFO 2025-10-23 15:36:17,424 llama_stack.cli.stack.run:159 cli: Listening on ['::', '0.0.0.0']:8321 INFO 2025-10-23 15:36:17,749 llama_stack.core.server.server:521 core::server: Run configuration: INFO 2025-10-23 15:36:17,756 llama_stack.core.server.server:524 core::server: apis: - agents - batches - datasetio - eval - files - inference - post_training - safety - scoring - tool_runtime - vector_io image_name: starter providers: agents: - config: persistence: agent_state: backend: kv_default namespace: agents responses: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: responses provider_id: meta-reference provider_type: inline::meta-reference batches: - config: kvstore: backend: kv_default namespace: batches provider_id: reference provider_type: inline::reference datasetio: - config: kvstore: backend: kv_default namespace: datasetio::huggingface provider_id: huggingface provider_type: remote::huggingface - config: kvstore: backend: kv_default namespace: datasetio::localfs provider_id: localfs provider_type: inline::localfs eval: - config: kvstore: backend: kv_default namespace: eval provider_id: meta-reference provider_type: inline::meta-reference files: - config: metadata_store: backend: sql_default table_name: files_metadata storage_dir: /Users/ianmiller/.llama/distributions/starter/files provider_id: meta-reference-files provider_type: inline::localfs inference: - config: api_key: '********' url: https://api.fireworks.ai/inference/v1 provider_id: fireworks provider_type: remote::fireworks - config: api_key: '********' url: https://api.together.xyz/v1 provider_id: together provider_type: remote::together - config: {} provider_id: bedrock provider_type: remote::bedrock - config: api_key: '********' base_url: https://api.openai.com/v1 provider_id: openai provider_type: remote::openai - config: api_key: '********' provider_id: anthropic provider_type: remote::anthropic - config: api_key: '********' provider_id: gemini provider_type: remote::gemini - config: api_key: '********' url: https://api.groq.com provider_id: groq provider_type: remote::groq - config: api_key: '********' url: https://api.sambanova.ai/v1 provider_id: sambanova provider_type: remote::sambanova - config: {} provider_id: sentence-transformers provider_type: inline::sentence-transformers post_training: - config: checkpoint_format: meta provider_id: torchtune-cpu provider_type: inline::torchtune-cpu safety: - config: excluded_categories: [] provider_id: llama-guard provider_type: inline::llama-guard - config: {} provider_id: code-scanner provider_type: inline::code-scanner scoring: - config: {} provider_id: basic provider_type: inline::basic - config: {} provider_id: llm-as-judge provider_type: inline::llm-as-judge - config: openai_api_key: '********' provider_id: braintrust provider_type: inline::braintrust tool_runtime: - config: api_key: '********' max_results: 3 provider_id: brave-search provider_type: remote::brave-search - config: api_key: '********' max_results: 3 provider_id: tavily-search provider_type: remote::tavily-search - config: {} provider_id: rag-runtime provider_type: inline::rag-runtime - config: {} provider_id: model-context-protocol provider_type: remote::model-context-protocol vector_io: - config: persistence: backend: kv_default namespace: vector_io::faiss provider_id: faiss provider_type: inline::faiss - config: db_path: /Users/ianmiller/.llama/distributions/starter/sqlite_vec.db persistence: backend: kv_default namespace: vector_io::sqlite_vec provider_id: sqlite-vec provider_type: inline::sqlite-vec registered_resources: benchmarks: [] datasets: [] models: [] scoring_fns: [] shields: [] tool_groups: - provider_id: tavily-search toolgroup_id: builtin::websearch - provider_id: rag-runtime toolgroup_id: builtin::rag vector_stores: [] server: port: 8321 storage: backends: kv_default: db_path: /Users/ianmiller/.llama/distributions/starter/kvstore.db type: kv_sqlite sql_default: db_path: /Users/ianmiller/.llama/distributions/starter/sql_store.db type: sql_sqlite stores: conversations: backend: sql_default table_name: openai_conversations inference: backend: sql_default max_write_queue_size: 10000 num_writers: 4 table_name: inference_store metadata: backend: kv_default namespace: registry prompts: backend: kv_default namespace: prompts telemetry: enabled: true vector_stores: default_embedding_model: model_id: nomic-ai/nomic-embed-text-v1.5 provider_id: sentence-transformers default_provider_id: faiss version: 2 INFO 2025-10-23 15:36:20,032 llama_stack.providers.utils.inference.inference_store:74 inference: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-23 15:36:20,422 llama_stack.providers.inline.telemetry.meta_reference.telemetry:84 telemetry: OTEL_EXPORTER_OTLP_ENDPOINT is not set, skipping telemetry INFO 2025-10-23 15:36:22,379 llama_stack.providers.utils.inference.openai_mixin:436 providers::utils: OpenAIInferenceAdapter.list_provider_model_ids() returned 105 models INFO 2025-10-23 15:36:22,703 uvicorn.error:84 uncategorized: Started server process [17328] INFO 2025-10-23 15:36:22,704 uvicorn.error:48 uncategorized: Waiting for application startup. INFO 2025-10-23 15:36:22,706 llama_stack.core.server.server:179 core::server: Starting up Llama Stack server (version: 0.3.0) INFO 2025-10-23 15:36:22,707 llama_stack.core.stack:470 core: starting registry refresh task INFO 2025-10-23 15:36:22,708 uvicorn.error:62 uncategorized: Application startup complete. INFO 2025-10-23 15:36:22,708 uvicorn.error:216 uncategorized: Uvicorn running on http://['::', '0.0.0.0']:8321 (Press CTRL+C to quit) ``` As you can see, prompts are attached to stores in config Testing: 1. Create prompt: ``` curl -X POST http://localhost:8321/v1/prompts \ -H "Content-Type: application/json" \ -d '{ "prompt": "Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.", "variables": ["name", "company", "role", "tone"] }' ``` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}% ` 2. Get prompt: `curl -X GET http://localhost:8321/v1/prompts/pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f` `{"prompt":"Hello {{name}}! You are working at {{company}}. Your role is {{role}} at {{company}}. Remember, {{name}}, to be {{tone}}.","version":1,"prompt_id":"pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e163f","variables":["name","company","role","tone"],"is_default":false}% ` 3. Query sqlite KV storage to check created prompt: ``` sqlite> .mode column sqlite> .headers on sqlite> SELECT * FROM kvstore WHERE key LIKE 'prompts:v1:%'; key value expiration ------------------------------------------------------------ ------------------------------------------------------------ ---------- prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e {"prompt_id": "pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab 163f:1 5f6e163f", "prompt": "Hello {{name}}! You are working at {{c ompany}}. Your role is {{role}} at {{company}}. Remember, {{ name}}, to be {{tone}}.", "version": 1, "variables": ["name" , "company", "role", "tone"], "is_default": false} prompts:v1:pmpt_a90e09e67acfe23776f2778c603eb6c17e139dab5f6e 1 163f:default sqlite> ``` |
||
|
|
9916cb3b17
|
chore: support default model in moderations API (#3890)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Test External API and Providers / test-external (venv) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 41s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do? https://platform.openai.com/docs/api-reference/moderations supports optional model parameter. This PR adds support for using moderations API with model=None if a default shield id is provided via safety config. ## Test Plan added tests manual test: ``` > SAFETY_MODEL='together/meta-llama/Llama-Guard-4-12B' uv run llama stack run starter > curl http://localhost:8321/v1/moderations \ -H "Content-Type: application/json" \ -d '{ "input": [ "hello" ] }' ``` |
||
|
|
bd3c473208
|
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877)
Reverts llamastack/llama-stack#3871 This PR broke RAG (even from Responses -- there _is_ a dependency) |
||
|
|
0e96279bee
|
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer targeted. We use the Responses implementation for knowledge_search which uses the `openai_vector_stores` pathway. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
|
94faec7bc5
|
chore(yaml)!: move registered resources to a sub-key (#3861)
**NOTE: this is a backwards incompatible change to the run-configs.** A small QOL update, but this will prove useful when I do a rename for "vector_dbs" to "vector_stores" next. Moves all the `models, shields, ...` keys in run-config under a `registered_resources` sub-key. |
||
|
|
48581bf651
|
chore: Updating how default embedding model is set in stack (#3818)
# What does this PR do?
Refactor setting default vector store provider and embedding model to
use an optional `vector_stores` config in the `StackRunConfig` and clean
up code to do so (had to add back in some pieces of VectorDB). Also
added remote Qdrant and Weaviate to starter distro (based on other PR
where inference providers were added for UX).
New config is simply (default for Starter distro):
```yaml
vector_stores:
default_provider_id: faiss
default_embedding_model:
provider_id: sentence-transformers
model_id: nomic-ai/nomic-embed-text-v1.5
```
## Test Plan
CI and Unit tests.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
|
||
|
|
2c43285e22
|
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.**
Run configs today repeat full SQLite/Postgres snippets everywhere a
store is needed, which means duplicated credentials, extra connection
pools, and lots of drift between files. This PR introduces named storage
backends so the stack and providers can share a single catalog and
reference those backends by name.
## Key Changes
- Add `storage.backends` to `StackRunConfig`, register each KV/SQL
backend once at startup, and validate that references point to the right
family.
- Move server stores under `storage.stores` with lightweight references
(backend + namespace/table) instead of full configs.
- Update every provider/config/doc to use the new reference style;
docs/codegen now surface the simplified YAML.
## Migration
Before:
```yaml
metadata_store:
type: sqlite
db_path: ~/.llama/distributions/foo/registry.db
inference_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
conversations_store:
type: postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
```
After:
```yaml
storage:
backends:
kv_default:
type: kv_sqlite
db_path: ~/.llama/distributions/foo/kvstore.db
sql_default:
type: sql_postgres
host: ${env.POSTGRES_HOST}
port: ${env.POSTGRES_PORT}
db: ${env.POSTGRES_DB}
user: ${env.POSTGRES_USER}
password: ${env.POSTGRES_PASSWORD}
stores:
metadata:
backend: kv_default
namespace: registry
inference:
backend: sql_default
table_name: inference_store
max_write_queue_size: 10000
num_writers: 4
conversations:
backend: sql_default
table_name: openai_conversations
```
Provider configs follow the same pattern—for example, a Chroma vector
adapter switches from:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
kvstore:
type: sqlite
db_path: ~/.llama/distributions/foo/chroma.db
```
to:
```yaml
providers:
vector_io:
- provider_id: chromadb
provider_type: remote::chromadb
config:
url: ${env.CHROMADB_URL}
persistence:
backend: kv_default
namespace: vector_io::chroma_remote
```
Once the backends are declared, everything else just points at them, so
rotating credentials or swapping to Postgres happens in one place and
the stack reuses a single connection pool.
|
||
|
|
ef4bc70bbe
|
feat: Enable setting a default embedding model in the stack (#3803)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do? Enables automatic embedding model detection for vector stores and by using a `default_configured` boolean that can be defined in the `run.yaml`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Unit tests - Integration tests - Simple example below: Spin up the stack: ```bash uv run llama stack build --distro starter --image-type venv --run ``` Then test with OpenAI's client: ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") vs = client.vector_stores.create() ``` Previously you needed: ```python vs = client.vector_stores.create( extra_body={ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "embedding_dimension": 384, } ) ``` The `extra_body` is now unnecessary. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
|
a165b8b5bb
|
chore!: BREAKING CHANGE removing VectorDB APIs (#3774)
# What does this PR do? Removes VectorDBs from API surface and our tests. Moves tests to Vector Stores. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
|
e7d21e1ee3
|
feat: Add support for Conversations in Responses API (#3743)
# What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
|
f50ce11a3b
|
feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403)
Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ``` |
||
|
|
a3f5072776
|
chore!: remove --env from llama stack run (#3711)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Installer CI / lint (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Installer CI / smoke-test-on-dev (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 2s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 2s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do? user can simply set env vars in the beginning of the command.`FOO=BAR llama stack run ...` ## Test Plan Run TELEMETRY_SINKS=coneol uv run --with llama-stack llama stack build --distro=starter --image-type=venv --run --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3711). * #3714 * __->__ #3711 |
||
|
|
a20e8eac8c
|
feat: Add OpenAI Conversations API (#3429)
# What does this PR do? Initial implementation for `Conversations` and `ConversationItems` using `AuthorizedSqlStore` with endpoints to: - CREATE - UPDATE - GET/RETRIEVE/LIST - DELETE Set `level=LLAMA_STACK_API_V1`. NOTE: This does not currently incorporate changes for Responses, that'll be done in a subsequent PR. Closes https://github.com/llamastack/llama-stack/issues/3235 ## Test Plan - Unit tests - Integration tests Also comparison of [OpenAPI spec for OpenAI API](https://github.com/openai/openai-openapi/tree/manual_spec) ```bash oasdiff breaking --fail-on ERR docs/static/llama-stack-spec.yaml https://raw.githubusercontent.com/openai/openai-openapi/refs/heads/manual_spec/openapi.yaml --strip-prefix-base "/v1/openai/v1" \ --match-path '(^/v1/openai/v1/conversations.*|^/conversations.*)' ``` Note I still have some uncertainty about this, I borrowed this info from @cdoern on https://github.com/llamastack/llama-stack/pull/3514 but need to spend more time to confirm it's working, at the moment it suggests it does. UPDATE on `oasdiff`, I investigated the OpenAI spec further and it looks like currently the spec does not list Conversations, so that analysis is useless. Noting for future reference. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
|
60484c5c4e
|
chore(api): remove batch inference (#3261)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Test Llama Stack Build / build (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 1s
API Conformance Tests / check-schema-compatibility (push) Successful in 7s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External API and Providers / test-external (venv) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 39s
Pre-commit / pre-commit (push) Successful in 1m18s
# What does this PR do? APIs removed: - POST /v1/batch-inference/completion - POST /v1/batch-inference/chat-completion - POST /v1/inference/batch-completion - POST /v1/inference/batch-chat-completion note - - batch-completion & batch-chat-completion were only implemented for inference=inline::meta-reference - batch-inference were not implemented |
||
|
|
4c2fcb6b51
|
chore: refactor server.main (#3462)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 10s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 18s
API Conformance Tests / check-schema-compatibility (push) Successful in 22s
UI Tests / ui-tests (22) (push) Successful in 29s
Pre-commit / pre-commit (push) Successful in 1m25s
# What does this PR do? As shown in #3421, we can scale stack to handle more RPS with k8s replicas. This PR enables multi process stack with uvicorn --workers so that we can achieve the same scaling without being in k8s. To achieve that we refactor main to split out the app construction logic. This method needs to be non-async. We created a new `Stack` class to house impls and have a `start()` method to be called in lifespan to start background tasks instead of starting them in the old `construct_stack`. This way we avoid having to manage an event loop manually. ## Test Plan CI > uv run --with llama-stack python -m llama_stack.core.server.server benchmarking/k8s-benchmark/stack_run_config.yaml works. > LLAMA_STACK_CONFIG=benchmarking/k8s-benchmark/stack_run_config.yaml uv run uvicorn llama_stack.core.server.server:create_app --port 8321 --workers 4 works. |
||
|
|
ad6ea7fb91
|
feat: Adding OpenAI Prompts API (#3319)
# What does this PR do? This PR adds support for OpenAI Prompts API. Note, OpenAI does not explicitly expose the Prompts API but instead makes it available in the Responses API and in the [Prompts Dashboard](https://platform.openai.com/docs/guides/prompting#create-a-prompt). I have added the following APIs: - CREATE - GET - LIST - UPDATE - Set Default Version The Set Default Version API is made available only in the Prompts Dashboard and configures which prompt version is returned in the GET (the latest version is the default). Overall, the expected functionality in Responses will look like this: ```python from openai import OpenAI client = OpenAI() response = client.responses.create( prompt={ "id": "pmpt_68b0c29740048196bd3a6e6ac3c4d0e20ed9a13f0d15bf5e", "version": "2", "variables": { "city": "San Francisco", "age": 30, } } ) ``` ### Resolves https://github.com/llamastack/llama-stack/issues/3276 ## Test Plan Unit tests added. Integration tests can be added after client generation. ## Next Steps 1. Update Responses API to support Prompt API 2. I'll enhance the UI to implement the Prompt Dashboard. 3. Add cache for lower latency --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
|
efdb5558b8
|
fix: Remove bfcl scoring function as not supported (#3281)
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Pre-commit / pre-commit (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 1s
Vector IO Integration Tests / test-matrix (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 0s
Test Llama Stack Build / generate-matrix (push) Failing after 2s
Test Llama Stack Build / build (push) Has been skipped
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 1s
UI Tests / ui-tests (22) (push) Failing after 0s
Unit Tests / unit-tests (3.12) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1s
# What does this PR do?
BFCL scoring function is not supported, removing it.
Also minor fixes as the llama stack run is broken for open-benchmark for
test plan verification
1. Correct the model paths for supported models
2. Fix another issue as there is no `provider_id` for DatasetInput but
logger assumes it exists.
```
File "/Users/swapna942/llama-stack/llama_stack/core/stack.py", line 332, in construct_stack
await register_resources(run_config, impls)
File "/Users/swapna942/llama-stack/llama_stack/core/stack.py", line 108, in register_resources
logger.debug(f"registering {rsrc.capitalize()} {obj} for provider {obj.provider_id}")
^^^^^^^^^^^^^^^
File "/Users/swapna942/llama-stack/.venv/lib/python3.13/site-packages/pydantic/main.py", line 991, in __getattr__
raise AttributeError(f'{type(self).__name__!r} object has no attribute {item!r}')
AttributeError: 'DatasetInput' object has no attribute 'provider_id'
```
## Test Plan
```llama stack build --distro open-benchmark --image-type venv``` and run the server succeeds
Issue Link: https://github.com/llamastack/llama-stack/issues/3282
|
||
|
|
52106d95d3
|
fix(env): env var replacement preserve types (#3270)
# What does this PR do? During env var replacement, we're implicitly converting all config types to their apparent types (e.g., "true" to True, "123" to 123). This may be arguably useful for when doing an env var substitution, as those are always strings, but we should definitely avoid touching config values that have explicit types and are uninvolved in env var substitution. ## Test Plan Unit |
||
|
|
cc87995e2b
|
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change. |
||
|
|
2665f00102
|
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb |
Renamed from llama_stack/distribution/stack.py (Browse further)