mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-24 16:57:21 +00:00
482 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
bd3c473208
|
revert: "chore(cleanup)!: remove tool_runtime.rag_tool" (#3877)
Reverts llamastack/llama-stack#3871 This PR broke RAG (even from Responses -- there _is_ a dependency) |
||
|
0e96279bee
|
chore(cleanup)!: remove tool_runtime.rag_tool (#3871)
Kill the `builtin::rag` tool group completely since it is no longer targeted. We use the Responses implementation for knowledge_search which uses the `openai_vector_stores` pathway. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
122de785c4
|
chore(cleanup)!: kill vector_db references as far as possible (#3864)
There should not be "vector db" anywhere. |
||
|
b215eb5944
|
chore: skip shutdown if otel_endpoint is not set (#3865)
# What does this PR do? rid following error when ctrl+c'd server │ /Users/erichuang/projects/lst3/llama_stack/providers/inline/telemetry/meta_reference/telemetry.py:92 in │ │ shutdown │ │ │ │ 89 │ │ pass │ │ 90 │ │ │ 91 │ async def shutdown(self) -> None: │ │ ❱ 92 │ │ trace.get_tracer_provider().force_flush() │ │ 93 │ │ │ 94 │ async def log_event(self, event: Event, ttl_seconds: int = 604800) -> None: │ │ 95 │ │ if isinstance(event, UnstructuredLogEvent): │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯ AttributeError: 'ProxyTracerProvider' object has no attribute 'force_flush' ## Test Plan |
||
|
5678c25b9d
|
chore: remove dead code (#3863)
# What does this PR do? ## Test Plan |
||
|
48581bf651
|
chore: Updating how default embedding model is set in stack (#3818)
# What does this PR do? Refactor setting default vector store provider and embedding model to use an optional `vector_stores` config in the `StackRunConfig` and clean up code to do so (had to add back in some pieces of VectorDB). Also added remote Qdrant and Weaviate to starter distro (based on other PR where inference providers were added for UX). New config is simply (default for Starter distro): ```yaml vector_stores: default_provider_id: faiss default_embedding_model: provider_id: sentence-transformers model_id: nomic-ai/nomic-embed-text-v1.5 ``` ## Test Plan CI and Unit tests. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
2c43285e22
|
feat(stores)!: use backend storage references instead of configs (#3697)
**This PR changes configurations in a backward incompatible way.** Run configs today repeat full SQLite/Postgres snippets everywhere a store is needed, which means duplicated credentials, extra connection pools, and lots of drift between files. This PR introduces named storage backends so the stack and providers can share a single catalog and reference those backends by name. ## Key Changes - Add `storage.backends` to `StackRunConfig`, register each KV/SQL backend once at startup, and validate that references point to the right family. - Move server stores under `storage.stores` with lightweight references (backend + namespace/table) instead of full configs. - Update every provider/config/doc to use the new reference style; docs/codegen now surface the simplified YAML. ## Migration Before: ```yaml metadata_store: type: sqlite db_path: ~/.llama/distributions/foo/registry.db inference_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} conversations_store: type: postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} ``` After: ```yaml storage: backends: kv_default: type: kv_sqlite db_path: ~/.llama/distributions/foo/kvstore.db sql_default: type: sql_postgres host: ${env.POSTGRES_HOST} port: ${env.POSTGRES_PORT} db: ${env.POSTGRES_DB} user: ${env.POSTGRES_USER} password: ${env.POSTGRES_PASSWORD} stores: metadata: backend: kv_default namespace: registry inference: backend: sql_default table_name: inference_store max_write_queue_size: 10000 num_writers: 4 conversations: backend: sql_default table_name: openai_conversations ``` Provider configs follow the same pattern—for example, a Chroma vector adapter switches from: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} kvstore: type: sqlite db_path: ~/.llama/distributions/foo/chroma.db ``` to: ```yaml providers: vector_io: - provider_id: chromadb provider_type: remote::chromadb config: url: ${env.CHROMADB_URL} persistence: backend: kv_default namespace: vector_io::chroma_remote ``` Once the backends are declared, everything else just points at them, so rotating credentials or swapping to Postgres happens in one place and the stack reuses a single connection pool. |
||
|
add64e8e2a
|
feat: Add instructions parameter in response object (#3741)
# Problem The current inline provider appends the user provided instructions to messages as a system prompt, but the returned response object does not contain the instructions field (as specified in the OpenAI responses spec). # What does this PR do? This pull request adds the instruction field to the response object definition and updates the inline provider. It also ensures that instructions from previous response is not carried over to the next response (as specified in the openAI spec). Closes #[3566](https://github.com/llamastack/llama-stack/issues/3566) ## Test Plan - Tested manually for change in model response w.r.t supplied instructions field. - Added unit test to check that the instructions from previous response is not carried over to the next response. - Added integration tests to check instructions parameter in the returned response object. - Added new recordings for the integration tests. --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> |
||
|
9936f33f7e
|
chore: disable telemetry if otel endpoint isn't set (#3859)
# What does this PR do? removes error: ConnectionError: HTTPConnectionPool(host='localhost', port=4318): Max retries exceeded with url: /v1/traces (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x10fd98e60>: Failed to establish a new connection: [Errno 61] Connection refused')) ## Test Plan uv run llama stack run starter curl http://localhost:8321/v1/models observe no error in server logs |
||
|
943558af36
|
test(telemetry): Telemetry Tests (#3805)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 10s
Python Package Build Test / build (3.13) (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 20s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (push) Failing after 30s
API Conformance Tests / check-schema-compatibility (push) Successful in 38s
UI Tests / ui-tests (22) (push) Successful in 1m32s
Pre-commit / pre-commit (push) Successful in 3m16s
# What does this PR do? Adds a test and a standardized way to build future tests out for telemetry in llama stack. Contributes to https://github.com/llamastack/llama-stack/issues/3806 ## Test Plan This is the test plan 😎 |
||
|
b3099d40e2
|
fix(telemetry): remove dependency on old telemetry config (#3830)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 10s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Python Package Build Test / build (3.13) (push) Failing after 13s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 21s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 57s
Vector IO Integration Tests / test-matrix (push) Failing after 1m13s
API Conformance Tests / check-schema-compatibility (push) Successful in 1m22s
UI Tests / ui-tests (22) (push) Successful in 1m33s
Pre-commit / pre-commit (push) Successful in 1m55s
# What does this PR do? old telemetry config was removed in #3815 ## Test Plan ❯ OTEL_SERVICE_NAME=aloha OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318 uv run llama stack run starter <img width="1888" height="605" alt="image" src="https://github.com/user-attachments/assets/dd5cc9f0-213a-4dc6-9385-f61a3a13b4c3" /> |
||
|
f22aaef42f
|
chore!: remove telemetry API usage (#3815)
# What does this PR do? remove telemetry as a providable API from the codebase. This includes removing it from generated distributions but also the provider registry, the router, etc since `setup_logger` is tied pretty strictly to `Api.telemetry` being in impls we still need an "instantiated provider" in our implementations. However it should not be auto-routed or provided. So in validate_and_prepare_providers (called from resolve_impls) I made it so that if run_config.telemetry.enabled, we set up the meta-reference "provider" internally to be used so that log_event will work when called. This is the neatest way I think we can remove telemetry from the provider configs but also not need to rip apart the whole "telemetry is a provider" logic just yet, but we can do it internally later without disrupting users. so telemetry is removed from the registry such that if a user puts `telemetry:` as an API in their build/run config it will err out, but can still be used by us internally as we go through this transition. relates to #3806 Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
f205ab6f6c
|
fix(responses): fixes, re-record tests (#3820)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 17s
UI Tests / ui-tests (22) (push) Successful in 55s
Pre-commit / pre-commit (push) Successful in 1m43s
Wanted to re-enable Responses CI but it seems to hang for some reason due to some interactions with conversations_store or responses_store. ## Test Plan ``` # library client ./scripts/integration-tests.sh --stack-config ci-tests --suite responses # server ./scripts/integration-tests.sh --stack-config server:ci-tests --suite responses ``` |
||
|
99141c29b1
|
feat: Add responses and safety impl extra_body (#3781)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 6s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (push) Failing after 9s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
API Conformance Tests / check-schema-compatibility (push) Successful in 19s
UI Tests / ui-tests (22) (push) Successful in 37s
Pre-commit / pre-commit (push) Successful in 1m33s
# What does this PR do? Have closed the previous PR due to merge conflicts with multiple PRs Addressed all comments from https://github.com/llamastack/llama-stack/pull/3768 (sorry for carrying over to this one) ## Test Plan Added UTs and integration tests |
||
|
8e7e0ddfec
|
fix(responses): use conversation items when no stored messages exist (#3819)
Handle a base case when no stored messages exist because no Response call has been made. ## Test Plan ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --inference-mode record-if-missing --pattern test_conversation_responses ``` |
||
|
6ba9db3929
|
chore!: BREAKING CHANGE: remove sqlite from telemetry config (#3808)
# What does this PR do? - Removed sqlite sink from telemetry config. - Removed related code - Updated doc related to telemetry ## Test Plan CI |
||
|
0a96a7faa5
|
fix(responses): fix subtle bugs in non-function tool calling (#3817)
We were generating "FunctionToolCall" items even for MCP (and file-search, etc.) server-side calls. ID mismatches, etc. galore. |
||
|
bc8b377a7c
|
fix(vector-io): handle missing document_id in insert_chunks (#3521)
Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. Fixes issue #3494 where /v1/vector-io/insert would crash with KeyError. Fixed KeyError when chunks don't have document_id in metadata or chunk_metadata. Updated logging to safely extract document_id using getattr and RAG memory to handle different document_id locations. Added test for missing document_id scenarios. # What does this PR do? Fixes a KeyError crash in `/v1/vector-io/insert` when chunks are missing `document_id` fields. The API was failing even though `document_id` is optional according to the schema. Closes #3494 ## Test Plan **Before fix:** - POST to `/v1/vector-io/insert` with chunks → 500 KeyError - Happened regardless of where `document_id` was placed **After fix:** - Same request works fine → 200 OK - Tested with Postman using FAISS backend - Added unit test covering missing `document_id` scenarios |
||
|
e9b4278a51
|
feat(responses)!: improve responses + conversations implementations (#3810)
This PR updates the Conversation item related types and improves a couple critical parts of the implemenation: - it creates a streaming output item for the final assistant message output by the model. until now we only added content parts and included that message in the final response. - rewrites the conversation update code completely to account for items other than messages (tool calls, outputs, etc.) ## Test Plan Used the test script from https://github.com/llamastack/llama-stack-client-python/pull/281 for this ``` TEST_API_BASE_URL=http://localhost:8321/v1 \ pytest tests/integration/test_agent_turn_step_events.py::test_client_side_function_tool -xvs ``` |
||
|
ef4bc70bbe
|
feat: Enable setting a default embedding model in the stack (#3803)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 11s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do? Enables automatic embedding model detection for vector stores and by using a `default_configured` boolean that can be defined in the `run.yaml`. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan - Unit tests - Integration tests - Simple example below: Spin up the stack: ```bash uv run llama stack build --distro starter --image-type venv --run ``` Then test with OpenAI's client: ```python from openai import OpenAI client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") vs = client.vector_stores.create() ``` Previously you needed: ```python vs = client.vector_stores.create( extra_body={ "embedding_model": "sentence-transformers/all-MiniLM-L6-v2", "embedding_dimension": 384, } ) ``` The `extra_body` is now unnecessary. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
007efa6eb5
|
refactor: replace default all-MiniLM-L6-v2 embedding model by nomic-embed-text-v1.5 in Llama Stack (#3183)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this PR is to replace the Llama Stack's default embedding model by nomic-embed-text-v1.5. These are the key reasons why Llama Stack community decided to switch from all-MiniLM-L6-v2 to nomic-embed-text-v1.5: 1. The training data for [all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2#training-data) includes a lot of data sets with various licensing terms, so it is tricky to know when/whether it is appropriate to use this model for commercial applications. 2. The model is not particularly competitive on major benchmarks. For example, if you look at the [MTEB Leaderboard](https://huggingface.co/spaces/mteb/leaderboard) and click on Miscellaneous/BEIR to see English information retrieval accuracy, you see that the top of the leaderboard is dominated by enormous models but also that there are many, many models of relatively modest size whith much higher Retrieval scores. If you want to look closely at the data, I recommend clicking "Download Table" because it is easier to browse that way. More discussion info can be founded [here](https://github.com/llamastack/llama-stack/issues/2418) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2418 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Run `./scripts/unit-tests.sh` 2. Integration tests via CI wokrflow --------- Signed-off-by: Sébastien Han <seb@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> Co-authored-by: Sébastien Han <seb@redhat.com> |
||
|
ecc8a554d2
|
feat(api)!: support extra_body to embeddings and vector_stores APIs (#3794)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 1s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m23s
Applies the same pattern from https://github.com/llamastack/llama-stack/pull/3777 to embeddings and vector_stores.create() endpoints. This should _not_ be a breaking change since (a) our tests were already using the `extra_body` parameter when passing in to the backend (b) but the backend probably wasn't extracting the parameters correctly. This PR will fix that. Updated APIs: `openai_embeddings(), openai_create_vector_store(), openai_create_vector_store_file_batch()` |
||
|
3bb6ef351b
|
chore!: Safety api refactoring to use OpenAIMessageParam (#3796)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 13s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do? Remove usage of deprecated `Message` from Safety apis ## Test Plan CI |
||
|
7c63aebd64
|
feat(responses)!: add reasoning and annotation added events (#3793)
Implements missing streaming events from OpenAI Responses API spec: - reasoning text/summary events for o1/o3 models, - refusal events for safety moderation - annotation events for citations, - and file search streaming events. Added optional reasoning_content field to chat completion chunks to support non-standard provider extensions. **NOTE:** OpenAI does _not_ fill reasoning_content when users use the chat_completion APIs. This means there is no way for us to implement Responses (with reasoning) by using OpenAI chat completions! We'd need to transparently punt to OpenAI's responses endpoints if we wish to do that. For others though (vLLM, etc.) we can use it. ## Test Plan File search streaming test passes: ``` ./scripts/integration-tests.sh --stack-config server:ci-tests \ --suite responses --setup gpt --inference-mode replay --pattern test_response_file_search_streaming_events ``` Need more complex setup and validation for reasoning tests (need a vLLM powered OSS model maybe gpt-oss which can return reasoning_content). I will do that in a followup PR. |
||
|
06e4cd8e02
|
feat(api)!: BREAKING CHANGE: support passing extra_body through to providers (#3777)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 1s
Python Package Build Test / build (3.13) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 38s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do? Allows passing through extra_body parameters to inference providers. With this, we removed the 2 vllm-specific parameters from completions API into `extra_body`. Before/After <img width="1883" height="324" alt="image" src="https://github.com/user-attachments/assets/acb27c08-c748-46c9-b1da-0de64e9908a1" /> closes #2720 ## Test Plan CI and added new test ``` ❯ uv run pytest -s -v tests/integration/ --stack-config=server:starter --inference-mode=record -k 'not( builtin_tool or safety_with_image or code_interpreter or test_rag ) and test_openai_completion_guided_choice' --setup=vllm --suite=base --color=yes Uninstalled 3 packages in 125ms Installed 3 packages in 19ms INFO 2025-10-10 14:29:54,317 tests.integration.conftest:118 tests: Applying setup 'vllm' for suite base INFO 2025-10-10 14:29:54,331 tests.integration.conftest:47 tests: Test stack config type: server (stack_config=server:starter) ============================================================================================================== test session starts ============================================================================================================== platform darwin -- Python 3.12.11, pytest-8.4.2, pluggy-1.6.0 -- /Users/erichuang/projects/llama-stack-1/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '8.4.2', 'pluggy': '1.6.0'}, 'Plugins': {'anyio': '4.9.0', 'html': '4.1.1', 'socket': '0.7.0', 'asyncio': '1.1.0', 'json-report': '1.5.0', 'timeout': '2.4.0', 'metadata': '3.1.1', 'cov': '6.2.1', 'nbval': '0.11.0'}} rootdir: /Users/erichuang/projects/llama-stack-1 configfile: pyproject.toml plugins: anyio-4.9.0, html-4.1.1, socket-0.7.0, asyncio-1.1.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, cov-6.2.1, nbval-0.11.0 asyncio: mode=Mode.AUTO, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function collected 285 items / 284 deselected / 1 selected tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] instantiating llama_stack_client Starting llama stack server with config 'starter' on port 8321... Waiting for server at http://localhost:8321... (0.0s elapsed) Waiting for server at http://localhost:8321... (0.5s elapsed) Waiting for server at http://localhost:8321... (5.1s elapsed) Waiting for server at http://localhost:8321... (5.6s elapsed) Waiting for server at http://localhost:8321... (10.1s elapsed) Waiting for server at http://localhost:8321... (10.6s elapsed) Server is ready at http://localhost:8321 llama_stack_client instantiated in 11.773s PASSEDTerminating llama stack server process... Terminating process 98444 and its group... Server process and children terminated gracefully ============================================================================================================= slowest 10 durations ============================================================================================================== 11.88s setup tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 3.02s call tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] 0.01s teardown tests/integration/inference/test_openai_completion.py::test_openai_completion_guided_choice[txt=vllm/Qwen/Qwen3-0.6B] ================================================================================================ 1 passed, 284 deselected, 3 warnings in 16.21s ================================================================================================= ``` |
||
|
80d58ab519
|
chore: refactor (chat)completions endpoints to use shared params struct (#3761)
# What does this PR do? Converts openai(_chat)_completions params to pydantic BaseModel to reduce code duplication across all providers. ## Test Plan CI --- [//]: # (BEGIN SAPLING FOOTER) Stack created with [Sapling](https://sapling-scm.com). Best reviewed with [ReviewStack](https://reviewstack.dev/llamastack/llama-stack/pull/3761). * #3777 * __->__ #3761 |
||
|
32fde8d9a8
|
feat: Add /v1/embeddings endpoint to batches API (#3384)
# What does this PR do? This PR extends the Llama Stack Batches API to support the /v1/embeddings endpoint, enabling efficient batch processing of embedding requests alongside the existing /v1/chat/completions and /v1/completions support. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes: https://github.com/llamastack/llama-stack/issues/3145 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> ``` (stack-client) ➜ llama-stack git:(support/embeddings-api) conda activate stack-client && python -m pytest tests/unit/providers/batches/test_reference.py -v ============================================================================================================================================ test session starts ============================================================================================================================================= platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-15.6.1-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'xdist': '3.8.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, xdist-3.8.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 46 items tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_and_retrieve_batch_success PASSED [ 2%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_without_metadata PASSED [ 4%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_completion_window PASSED [ 6%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[/v1/invalid/endpoint] PASSED [ 8%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_endpoints[] PASSED [ 10%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_invalid_metadata PASSED [ 13%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_retrieve_batch_not_found PASSED [ 15%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_success PASSED [ 17%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[failed] PASSED [ 19%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[expired] PASSED [ 21%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_invalid_statuses[completed] PASSED [ 23%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_cancel_batch_not_found PASSED [ 26%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_empty PASSED [ 28%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_single_batch PASSED [ 30%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_multiple_batches PASSED [ 32%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_limit PASSED [ 34%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_with_pagination PASSED [ 36%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_list_batches_invalid_after PASSED [ 39%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_kvstore_persistence PASSED [ 41%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_not_found PASSED [ 43%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_exists_empty_content PASSED [ 45%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_file_mixed_valid_invalid_json PASSED [ 47%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_model PASSED [ 50%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED [ 52%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED [ 54%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED [ 56%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED [ 58%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[model-body.model-invalid_request-Model parameter is required] PASSED [ 60%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_chat_completions[messages-body.messages-invalid_request-Messages parameter is required] PASSED [ 63%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[custom_id-custom_id-missing_required_parameter-Missing required parameter: custom_id] PASSED [ 65%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[method-method-missing_required_parameter-Missing required parameter: method] PASSED [ 67%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[url-url-missing_required_parameter-Missing required parameter: url] PASSED [ 69%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[body-body-missing_required_parameter-Missing required parameter: body] PASSED [ 71%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[model-body.model-invalid_request-Model parameter is required] PASSED [ 73%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_missing_parameters_completions[prompt-body.prompt-invalid_request-Prompt parameter is required] PASSED [ 76%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_url_mismatch PASSED [ 78%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_multiple_errors_per_request PASSED [ 80%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_request_format PASSED [ 82%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[custom_id-custom_id-12345-Custom_id must be a string] PASSED [ 84%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[url-url-123-URL must be a string] PASSED [ 86%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[method-method-invalid_value2-Method must be a string] PASSED [ 89%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[body-body-invalid_value3-Body must be a JSON dictionary object] PASSED [ 91%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[model-body.model-123-Model must be a string] PASSED [ 93%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_validate_input_invalid_parameter_types[messages-body.messages-invalid messages format-Messages must be an array] PASSED [ 95%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_max_concurrent_batches PASSED [ 97%] tests/unit/providers/batches/test_reference.py::TestReferenceBatchesImpl::test_create_batch_embeddings_endpoint PASSED [100%] ``` --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
1394403360
|
feat(responses): implement usage tracking in streaming responses (#3771)
Implementats usage accumulation to StreamingResponseOrchestrator. The most important part was to pass `stream_options = { "include_usage": true }` to the chat_completion call. This means I will have to record all responses tests again because request hash will change :) Test changes: - Add usage assertions to streaming and non-streaming tests - Update test recordings with actual usage data from OpenAI |
||
|
e7d21e1ee3
|
feat: Add support for Conversations in Responses API (#3743)
# What does this PR do? This PR adds support for Conversations in Responses. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Unit tests Integration tests <Details> <Summary>Manual testing with this script: (click to expand)</Summary> ```python from openai import OpenAI client = OpenAI() client = OpenAI(base_url="http://localhost:8321/v1/", api_key="none") def test_conversation_create(): print("Testing conversation create...") conversation = client.conversations.create( metadata={"topic": "demo"}, items=[ {"type": "message", "role": "user", "content": "Hello!"} ] ) print(f"Created: {conversation}") return conversation def test_conversation_retrieve(conv_id): print(f"Testing conversation retrieve for {conv_id}...") retrieved = client.conversations.retrieve(conv_id) print(f"Retrieved: {retrieved}") return retrieved def test_conversation_update(conv_id): print(f"Testing conversation update for {conv_id}...") updated = client.conversations.update( conv_id, metadata={"topic": "project-x"} ) print(f"Updated: {updated}") return updated def test_conversation_delete(conv_id): print(f"Testing conversation delete for {conv_id}...") deleted = client.conversations.delete(conv_id) print(f"Deleted: {deleted}") return deleted def test_conversation_items_create(conv_id): print(f"Testing conversation items create for {conv_id}...") items = client.conversations.items.create( conv_id, items=[ { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "Hello!"}] }, { "type": "message", "role": "user", "content": [{"type": "input_text", "text": "How are you?"}] } ] ) print(f"Items created: {items}") return items def test_conversation_items_list(conv_id): print(f"Testing conversation items list for {conv_id}...") items = client.conversations.items.list(conv_id, limit=10) print(f"Items list: {items}") return items def test_conversation_item_retrieve(conv_id, item_id): print(f"Testing conversation item retrieve for {conv_id}/{item_id}...") item = client.conversations.items.retrieve(conversation_id=conv_id, item_id=item_id) print(f"Item retrieved: {item}") return item def test_conversation_item_delete(conv_id, item_id): print(f"Testing conversation item delete for {conv_id}/{item_id}...") deleted = client.conversations.items.delete(conversation_id=conv_id, item_id=item_id) print(f"Item deleted: {deleted}") return deleted def test_conversation_responses_create(): print("\nTesting conversation create for a responses example...") conversation = client.conversations.create() print(f"Created: {conversation}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "What are the 5 Ds of dodgeball?"}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") return response, conversation def test_conversations_responses_create_followup( conversation, content="Repeat what you just said but add 'this is my second time saying this'", ): print(f"Using: {conversation.id}") response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": content}], conversation=conversation.id, ) print(f"Created response: {response} for conversation {conversation.id}") conv_items = client.conversations.items.list(conversation.id) print(f"\nRetrieving list of items for conversation {conversation.id}:") print(conv_items.model_dump_json(indent=2)) def test_response_with_fake_conv_id(): fake_conv_id = "conv_zzzzzzzzz5dc81908289d62779d2ac510a2b0b602ef00a44" print(f"Using {fake_conv_id}") try: response = client.responses.create( model="gpt-4.1", input=[{"role": "user", "content": "say hello"}], conversation=fake_conv_id, ) print(f"Created response: {response} for conversation {fake_conv_id}") except Exception as e: print(f"failed to create response for conversation {fake_conv_id} with error {e}") def main(): print("Testing OpenAI Conversations API...") # Create conversation conversation = test_conversation_create() conv_id = conversation.id # Retrieve conversation test_conversation_retrieve(conv_id) # Update conversation test_conversation_update(conv_id) # Create items items = test_conversation_items_create(conv_id) # List items items_list = test_conversation_items_list(conv_id) # Retrieve specific item if items_list.data: item_id = items_list.data[0].id test_conversation_item_retrieve(conv_id, item_id) # Delete item test_conversation_item_delete(conv_id, item_id) # Delete conversation test_conversation_delete(conv_id) response, conversation2 = test_conversation_responses_create() print('\ntesting reseponse retrieval') test_conversation_retrieve(conversation2.id) print('\ntesting responses follow up') test_conversations_responses_create_followup(conversation2) print('\ntesting responses follow up x2!') test_conversations_responses_create_followup( conversation2, content="Repeat what you just said but add 'this is my third time saying this'", ) test_response_with_fake_conv_id() print("All tests completed!") if __name__ == "__main__": main() ``` </Details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
8bf07f91cb
|
feat: reuse previous mcp tool listings where possible (#3710)
# What does this PR do? This PR checks whether, if a previous response is linked, there are mcp_list_tools objects that can be reused instead of listing the tools explicitly every time. Closes #3106 ## Test Plan Tested manually. Added unit tests to cover new behaviour. --------- Signed-off-by: Gordon Sim <gsim@redhat.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
e039b61d26
|
feat(responses)!: add in_progress, failed, content part events (#3765)
## Summary - add schema + runtime support for response.in_progress / response.failed / response.incomplete - stream content parts with proper indexes and reasoning slots - align tests + docs with the richer event payloads ## Testing - uv run pytest tests/unit/providers/agents/meta_reference/test_openai_responses.py::test_create_openai_response_with_string_input - uv run pytest tests/unit/providers/agents/meta_reference/test_response_conversion_utils.py |
||
|
ebae0385bb
|
fix: update dangling references to llama download command (#3763)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Test Llama Stack Build / build-single-provider (push) Failing after 3s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Test Llama Stack Build / build (push) Failing after 3s
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 2m14s
## Summary After removing model management CLI in #3700, this PR updates remaining references to the old `llama download` command to use `huggingface-cli download` instead. ## Changes - Updated error messages in `meta_reference/common.py` to recommend `huggingface-cli download` - Updated error messages in `torchtune/recipes/lora_finetuning_single_device.py` to use `huggingface-cli download` - Updated post-training notebook to use `huggingface-cli download` instead of `llama download` - Fixed typo: "you model" -> "your model" ## Test Plan - Verified error messages provide correct guidance for users - Checked that notebook instructions are up-to-date with current tooling |
||
|
f50ce11a3b
|
feat(tests): make inference_recorder into api_recorder (include tool_invoke) (#3403)
Renames `inference_recorder.py` to `api_recorder.py` and extends it to support recording/replaying tool invocations in addition to inference calls. This allows us to record web-search, etc. tool calls and thereafter apply recordings for `tests/integration/responses` ## Test Plan ``` export OPENAI_API_KEY=... export TAVILY_SEARCH_API_KEY=... ./scripts/integration-tests.sh --stack-config ci-tests \ --suite responses --inference-mode record-if-missing ``` |
||
|
26fd5dbd34
|
fix: add traces for tool calls and mcp tool listing (#3722)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 0s
Python Package Build Test / build (3.13) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 4s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.13) (push) Failing after 5s
API Conformance Tests / check-schema-compatibility (push) Successful in 15s
UI Tests / ui-tests (22) (push) Successful in 42s
Pre-commit / pre-commit (push) Successful in 1m24s
# What does this PR do? Adds traces around tool execution and mcp tool listing for better observability. Closes #3108 ## Test Plan Manually examined traces in jaeger to verify the added information was available. Signed-off-by: Gordon Sim <gsim@redhat.com> |
||
|
96886afaca
|
fix(responses): fix regression in support for mcp tool require_approval argument (#3731)
# What does this PR do? It prevents a tool call message being added to the chat completions message without a corresponding tool call result, which is needed in the case that an approval is required first or if the approval request is denied. In both these cases the tool call messages is popped of the next turn messages. Closes #3728 ## Test Plan Ran the integration tests Manual check of both approval and denial against gpt-4o Signed-off-by: Gordon Sim <gsim@redhat.com> |
||
|
1ac320b7e6
|
chore: remove dead code (#3729)
# What does this PR do? Removing some dead code, found by vulture and checked by claude that there are no references or imports for these ## Test Plan CI |
||
|
c2d97a9db9
|
chore: fix flaky unit test and add proper shutdown for file batches (#3725)
# What does this PR do?
Have been running into flaky unit test failures:
|
||
|
d5b136ac66
|
feat: Enabling Annotations in Responses (#3698)
# What does this PR do? Implements annotations for `file_search` tool. Also adds some logs and tests. ## How does this work? 1. **Citation Markers**: Models insert `<|file-id|>` tokens during generation with instructions from search results 2. **Post-Processing**: Extract markers using regex to calculate character positions and create `AnnotationFileCitation` objects 3. **File Mapping**: Store filename metadata during vector store operations for proper citation display ## Example This is the updated `quickstart.py` script, which uses the `extra_body` to register the embedding model. ```python import io, requests from openai import OpenAI url="https://www.paulgraham.com/greatwork.html" model = "gpt-4o-mini" client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") vs = client.vector_stores.create( name="my_citations_db", extra_body={ "embedding_model": "ollama/nomic-embed-text:latest", "embedding_dimension": 768, } ) response = requests.get(url) pseudo_file = io.BytesIO(str(response.content).encode('utf-8')) file_id = client.files.create(file=(url, pseudo_file, "text/html"), purpose="assistants").id client.vector_stores.files.create(vector_store_id=vs.id, file_id=file_id) resp = client.responses.create( model=model, input="How do you do great work? Use our existing knowledge_search tool.", tools=[{"type": "file_search", "vector_store_ids": [vs.id]}], include=["file_search_call.results"], ) print(resp) ``` <details> <summary> Example of the full response </summary> ```python INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/files "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/vector_stores/vs_0f6f7e35-f48b-4850-8604-8117d9a50e0a/files "HTTP/1.1 200 OK" INFO:httpx:HTTP Request: POST http://localhost:8321/v1/openai/v1/responses "HTTP/1.1 200 OK" Response(id='resp-28f5793d-3272-4de3-81f6-8cbf107d5bcd', created_at=1759797954.0, error=None, incomplete_details=None, instructions=None, metadata=None, model='gpt-4o-mini', object='response', output=[ResponseFileSearchToolCall(id='call_xWtvEQETN5GNiRLLiBIDKntg', queries=['how to do great work tips'], status='completed', type='file_search_call', results=[Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.3722624322210302, text='\\\'re looking where few have looked before.<br /><br />One sign that you\\\'re suited for some kind of work is when you like\\neven the parts that other people find tedious or frightening.<br /><br />But fields aren\\\'t people; you don\\\'t owe them any loyalty. If in the\\ncourse of working on one thing you discover another that\\\'s more\\nexciting, don\\\'t be afraid to switch.<br /><br />If you\\\'re making something for people, make sure it\\\'s something\\nthey actually want. The best way to do this is to make something\\nyou yourself want. Write the story you want to read; build the tool\\nyou want to use. Since your friends probably have similar interests,\\nthis will also get you your initial audience.<br /><br />This <i>should</i> follow from the excitingness rule. Obviously the most\\nexciting story to write will be the one you want to read. The reason\\nI mention this case explicitly is that so many people get it wrong.\\nInstead of making what they want, they try to make what some\\nimaginary, more sophisticated audience wants. And once you go down\\nthat route, you\\\'re lost.\\n<font color=#dddddd>[<a href="#f6n"><font color=#dddddd>6</font></a>]</font><br /><br />There are a lot of forces that will lead you astray when you\\\'re\\ntrying to figure out what to work on. Pretentiousness, fashion,\\nfear, money, politics, other people\\\'s wishes, eminent frauds. But\\nif you stick to what you find genuinely interesting, you\\\'ll be proof\\nagainst all of them. If you\\\'re interested, you\\\'re not astray.<br /><br /><br /><br /><br /><br />\\nFollowing your interests may sound like a rather passive strategy,\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.2532794607643494, text=' with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good colleagues\\noffer <i>surprising</i> insights. They can see and do things that you\\ncan\\\'t. So if you have a handful of colleagues good enough to keep\\nyou on your toes in this sense, you\\\'re probably over the threshold.<br /><br />Most of us can benefit from collaborating with colleagues, but some\\nprojects require people on a larger scale, and starting one of those\\nis not for everyone. If you want to run a project like that, you\\\'ll\\nhave to become a manager, and managing well takes aptitude and\\ninterest like any other kind of work. If you don\\\'t have them, there\\nis no middle path: you must either force yourself to learn management\\nas a second language, or avoid such projects.\\n<font color=#dddddd>[<a href="#f27n"><font color=#dddddd>27</font></a>]</font><br /><br /><br /><br /><br /><br />\\nHusband your morale. It\\\'s the basis of everything when you\\\'re working\\non ambitious projects. You have to nurture and protect it like a\\nliving organism.<br /><br />Morale starts with your view of life. You\\\'re more likely to do great\\nwork if you\\\'re an optimist, and more likely to if you think of\\nyourself as lucky than if you think of yourself as a victim.<br /><br />Indeed, work can to some extent protect you from your problems. If\\nyou choose work that\\\'s pure, its very difficulties will serve as a\\nrefuge from the difficulties of everyday life. If this is escapism,\\nit\\\'s a very productive form of it, and one that has been used by\\nsome of the greatest minds in history.<br /><br />Morale compounds via work: high morale helps you do good work, which\\nincreases your morale and helps you do even'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1973485818164222, text=' your\\nability and interest can take you. And you can only answer that by\\ntrying.<br /><br />Many more people could try to do great work than do. What holds\\nthem back is a combination of modesty and fear. It seems presumptuous\\nto try to be Newton or Shakespeare. It also seems hard; surely if\\nyou tried something like that, you\\\'d fail. Presumably the calculation\\nis rarely explicit. Few people consciously decide not to try to do\\ngreat work. But that\\\'s what\\\'s going on subconsciously; they shy\\naway from the question.<br /><br />So I\\\'m going to pull a sneaky trick on you. Do you want to do great\\nwork, or not? Now you have to decide consciously. Sorry about that.\\nI wouldn\\\'t have done it to a general audience. But we already know\\nyou\\\'re interested.<br /><br />Don\\\'t worry about being presumptuous. You don\\\'t have to tell anyone.\\nAnd if it\\\'s too hard and you fail, so what? Lots of people have\\nworse problems than that. In fact you\\\'ll be lucky if it\\\'s the worst\\nproblem you have.<br /><br />Yes, you\\\'ll have to work hard. But again, lots of people have to\\nwork hard. And if you\\\'re working on something you find very\\ninteresting, which you necessarily will if you\\\'re on the right path,\\nthe work will probably feel less burdensome than a lot of your\\npeers\\\'.<br /><br />The discoveries are out there, waiting to be made. Why not by you?<br /><br /><br /><br /><br /><br /><br /><br /><br /><br />\\n<b>Notes</b><br /><br />[<a name="f1n"><font color=#000000>1</font></a>]\\nI don\\\'t think you could give a precise definition of what\\ncounts as great work. Doing great work means doing something important\\nso well that you expand people\\\'s ideas of what\\\'s possible. But\\nthere\\\'s no threshold for importance. It\\\'s a matter of degree, and\\noften hard to judge at the time anyway. So I\\\'d rather people focused\\non developing their interests rather than worrying about whether\\nthey\\\'re important or not. Just try to do something amazing, and\\nleave it to future generations to say if you succeeded.<br /><br />[<a name="f2n"><font color=#000000>2</font></a>]\\nA lot of standup comedy is based on noticing anomalies in\\neveryday life. "Did you ever notice...?" New ideas come from doing\\nthis about nontrivial things. Which may help explain why people\\\'s\\nreaction to a new idea is often the first half of laughing: Ha!<br /><br />[<a name="f3n"><font color=#000000>3</font></a>]\\nThat second qualifier is critical. If you\\\'re excited about\\nsomething most authorities discount, but you can\\\'t give a more\\nprecise explanation than "they don\\\'t get it," then you\\\'re starting\\nto drift into the territory of cranks.<br /><br />[<a name="f4n"><font color=#000000>4</font></a>]\\nFinding something to work on is not simply a matter of finding\\na match between the current version of you and a list of known\\nproblems. You\\\'ll often have to coevolve with the problem. That\\\'s\\nwhy it can sometimes be so hard to figure out what to work on. The\\nsearch space is huge. It\\\'s the cartesian product of all possible\\nt'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1764591706535943, text='\\noptimistic, and even though one of the sources of their optimism\\nis ignorance, in this case ignorance can sometimes beat knowledge.<br /><br />Try to finish what you start, though, even if it turns out to be\\nmore work than you expected. Finishing things is not just an exercise\\nin tidiness or self-discipline. In many projects a lot of the best\\nwork happens in what was meant to be the final stage.<br /><br />Another permissible lie is to exaggerate the importance of what\\nyou\\\'re working on, at least in your own mind. If that helps you\\ndiscover something new, it may turn out not to have been a lie after\\nall.\\n<font color=#dddddd>[<a href="#f7n"><font color=#dddddd>7</font></a>]</font><br /><br /><br /><br /><br /><br />\\nSince there are two senses of starting work — per day and per\\nproject — there are also two forms of procrastination. Per-project\\nprocrastination is far the more dangerous. You put off starting\\nthat ambitious project from year to year because the time isn\\\'t\\nquite right. When you\\\'re procrastinating in units of years, you can\\nget a lot not done.\\n<font color=#dddddd>[<a href="#f8n"><font color=#dddddd>8</font></a>]</font><br /><br />One reason per-project procrastination is so dangerous is that it\\nusually camouflages itself as work. You\\\'re not just sitting around\\ndoing nothing; you\\\'re working industriously on something else. So\\nper-project procrastination doesn\\\'t set off the alarms that per-day\\nprocrastination does. You\\\'re too busy to notice it.<br /><br />The way to beat it is to stop occasionally and ask yourself: Am I\\nworking on what I most want to work on? When you\\\'re young it\\\'s ok\\nif the answer is sometimes no, but this gets increasingly dangerous\\nas you get older.\\n<font color=#dddddd>[<a href="#f9n"><font color=#dddddd>9</font></a>]</font><br /><br /><br /><br /><br /><br />\\nGreat work usually entails spending what would seem to most people\\nan unreasonable amount of time on a problem. You can\\\'t think of\\nthis time as a cost, or it will seem too high. You have to find the\\nwork sufficiently engaging as it\\\'s happening.<br /><br />There may be some jobs where you have to work diligently for years\\nat things you hate before you get to the good part, but this is not\\nhow great work happens. Great work happens by focusing consistently\\non something you\\\'re genuinely interested in. When you pause to take\\nstock, you\\\'re surprised how far you\\\'ve come.<br /><br />The reason we\\\'re surprised is that we underestimate the cumulative\\neffect of work. Writing a page a day doesn\\\'t sound like much, but\\nif you do it every day you\\\'ll write a book a year. That\\\'s the key:\\nconsistency. People who do great things don\\\'t get a lot done every\\nday. They get something done, rather than nothing.<br /><br />If you do work that compounds, you\\\'ll get exponential growth. Most\\npeople who do this do it unconsciously, but it\\\'s worth stopping to\\nthink about. Learning, for example, is an instance of this phenomenon:\\nthe more you learn about something, the easier it is to learn more.\\nGrowing an audience is another: the more fans you have, the more\\nnew fans they\\\'ll bring you.<br /><br />'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.174069664815369, text='\\ninside.<br /><br /><br /><br /><br /><br />Let\\\'s talk a little more about the complicated business of figuring\\nout what to work on. The main reason it\\\'s hard is that you can\\\'t\\ntell what most kinds of work are like except by doing them. Which\\nmeans the four steps overlap: you may have to work at something for\\nyears before you know how much you like it or how good you are at\\nit. And in the meantime you\\\'re not doing, and thus not learning\\nabout, most other kinds of work. So in the worst case you choose\\nlate based on very incomplete information.\\n<font color=#dddddd>[<a href="#f4n"><font color=#dddddd>4</font></a>]</font><br /><br />The nature of ambition exacerbates this problem. Ambition comes in\\ntwo forms, one that precedes interest in the subject and one that\\ngrows out of it. Most people who do great work have a mix, and the\\nmore you have of the former, the harder it will be to decide what\\nto do.<br /><br />The educational systems in most countries pretend it\\\'s easy. They\\nexpect you to commit to a field long before you could know what\\nit\\\'s really like. And as a result an ambitious person on an optimal\\ntrajectory will often read to the system as an instance of breakage.<br /><br />It would be better if they at least admitted it — if they admitted\\nthat the system not only can\\\'t do much to help you figure out what\\nto work on, but is designed on the assumption that you\\\'ll somehow\\nmagically guess as a teenager. They don\\\'t tell you, but I will:\\nwhen it comes to figuring out what to work on, you\\\'re on your own.\\nSome people get lucky and do guess correctly, but the rest will\\nfind themselves scrambling diagonally across tracks laid down on\\nthe assumption that everyone does.<br /><br />What should you do if you\\\'re young and ambitious but don\\\'t know\\nwhat to work on? What you should <i>not</i> do is drift along passively,\\nassuming the problem will solve itself. You need to take action.\\nBut there is no systematic procedure you can follow. When you read\\nbiographies of people who\\\'ve done great work, it\\\'s remarkable how\\nmuch luck is involved. They discover what to work on as a result\\nof a chance meeting, or by reading a book they happen to pick up.\\nSo you need to make yourself a big target for luck, and the way to\\ndo that is to be curious. Try lots of things, meet lots of people,\\nread lots of books, ask lots of questions.\\n<font color=#dddddd>[<a href="#f5n"><font color=#dddddd>5</font></a>]</font><br /><br />When in doubt, optimize for interestingness. Fields change as you\\nlearn more about them. What mathematicians do, for example, is very\\ndifferent from what you do in high school math classes. So you need\\nto give different types of work a chance to show you what they\\\'re\\nlike. But a field should become <i>increasingly</i> interesting as you\\nlearn more about it. If it doesn\\\'t, it\\\'s probably not for you.<br /><br />Don\\\'t worry if you find you\\\'re interested in different things than\\nother people. The stranger your tastes in interestingness, the\\nbetter. Strange tastes are often strong ones, and a strong taste\\nfor work means you\\\'ll be productive. And you\\\'re more likely to find\\nnew things if you'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.158095578895721, text='. Don\\\'t copy the manner of\\nan eminent 50 year old professor if you\\\'re 18, for example, or the\\nidiom of a Renaissance poem hundreds of years later.<br /><br />Some of the features of things you admire are flaws they succeeded\\ndespite. Indeed, the features that are easiest to imitate are the\\nmost likely to be the flaws.<br /><br />This is particularly true for behavior. Some talented people are\\njerks, and this sometimes makes it seem to the inexperienced that\\nbeing a jerk is part of being talented. It isn\\\'t; being talented\\nis merely how they get away with it.<br /><br />One of the most powerful kinds of copying is to copy something from\\none field into another. History is so full of chance discoveries\\nof this type that it\\\'s probably worth giving chance a hand by\\ndeliberately learning about other kinds of work. You can take ideas\\nfrom quite distant fields if you let them be metaphors.<br /><br />Negative examples can be as inspiring as positive ones. In fact you\\ncan sometimes learn more from things done badly than from things\\ndone well; sometimes it only becomes clear what\\\'s needed when it\\\'s\\nmissing.<br /><br /><br /><br /><br /><br />\\nIf a lot of the best people in your field are collected in one\\nplace, it\\\'s usually a good idea to visit for a while. It will\\nincrease your ambition, and also, by showing you that these people\\nare human, increase your self-confidence.\\n<font color=#dddddd>[<a href="#f26n"><font color=#dddddd>26</font></a>]</font><br /><br />If you\\\'re earnest you\\\'ll probably get a warmer welcome than you\\nmight expect. Most people who are very good at something are happy\\nto talk about it with anyone who\\\'s genuinely interested. If they\\\'re\\nreally good at their work, then they probably have a hobbyist\\\'s\\ninterest in it, and hobbyists always want to talk about their\\nhobbies.<br /><br />It may take some effort to find the people who are really good,\\nthough. Doing great work has such prestige that in some places,\\nparticularly universities, there\\\'s a polite fiction that everyone\\nis engaged in it. And that is far from true. People within universities\\ncan\\\'t say so openly, but the quality of the work being done in\\ndifferent departments varies immensely. Some departments have people\\ndoing great work; others have in the past; others never have.<br /><br /><br /><br /><br /><br />\\nSeek out the best colleagues. There are a lot of projects that can\\\'t\\nbe done alone, and even if you\\\'re working on one that can be, it\\\'s\\ngood to have other people to encourage you and to bounce ideas off.<br /><br />Colleagues don\\\'t just affect your work, though; they also affect\\nyou. So work with people you want to become like, because you will.<br /><br />Quality is more important than quantity in colleagues. It\\\'s better\\nto have one or two great ones than a building full of pretty good\\nones. In fact it\\\'s not merely better, but necessary, judging from\\nhistory: the degree to which great work happens in clusters suggests\\nthat one\\\'s colleagues often make the difference between doing great\\nwork and not.<br /><br />How do you know when you have sufficiently good colleagues? In my\\nexperience, when you do, you know. Which means if you\\\'re unsure,\\nyou probably don\\\'t. But it may be possible to give a more concrete\\nanswer than that. Here\\\'s an attempt: sufficiently good'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1566747762241967, text=',\\nbut in practice it usually means following them past all sorts of\\nobstacles. You usually have to risk rejection and failure. So it\\ndoes take a good deal of boldness.<br /><br />But while you need boldness, you don\\\'t usually need much planning.\\nIn most cases the recipe for doing great work is simply: work hard\\non excitingly ambitious projects, and something good will come of\\nit. Instead of making a plan and then executing it, you just try\\nto preserve certain invariants.<br /><br />The trouble with planning is that it only works for achievements\\nyou can describe in advance. You can win a gold medal or get rich\\nby deciding to as a child and then tenaciously pursuing that goal,\\nbut you can\\\'t discover natural selection that way.<br /><br />I think for most people who want to do great work, the right strategy\\nis not to plan too much. At each stage do whatever seems most\\ninteresting and gives you the best options for the future. I call\\nthis approach "staying upwind." This is how most people who\\\'ve done\\ngreat work seem to have done it.<br /><br /><br /><br /><br /><br />\\nEven when you\\\'ve found something exciting to work on, working on\\nit is not always straightforward. There will be times when some new\\nidea makes you leap out of bed in the morning and get straight to\\nwork. But there will also be plenty of times when things aren\\\'t\\nlike that.<br /><br />You don\\\'t just put out your sail and get blown forward by inspiration.\\nThere are headwinds and currents and hidden shoals. So there\\\'s a\\ntechnique to working, just as there is to sailing.<br /><br />For example, while you must work hard, it\\\'s possible to work too\\nhard, and if you do that you\\\'ll find you get diminishing returns:\\nfatigue will make you stupid, and eventually even damage your health.\\nThe point at which work yields diminishing returns depends on the\\ntype. Some of the hardest types you might only be able to do for\\nfour or five hours a day.<br /><br />Ideally those hours will be contiguous. To the extent you can, try\\nto arrange your life so you have big blocks of time to work in.\\nYou\\\'ll shy away from hard tasks if you know you might be interrupted.<br /><br />It will probably be harder to start working than to keep working.\\nYou\\\'ll often have to trick yourself to get over that initial\\nthreshold. Don\\\'t worry about this; it\\\'s the nature of work, not a\\nflaw in your character. Work has a sort of activation energy, both\\nper day and per project. And since this threshold is fake in the\\nsense that it\\\'s higher than the energy required to keep going, it\\\'s\\nok to tell yourself a lie of corresponding magnitude to get over\\nit.<br /><br />It\\\'s usually a mistake to lie to yourself if you want to do great\\nwork, but this is one of the rare cases where it isn\\\'t. When I\\\'m\\nreluctant to start work in the morning, I often trick myself by\\nsaying "I\\\'ll just read over what I\\\'ve got so far." Five minutes\\nlater I\\\'ve found something that seems mistaken or incomplete, and\\nI\\\'m off.<br /><br />Similar techniques work for starting new projects. It\\\'s ok to lie\\nto yourself about how much work a project will entail, for example.\\nLots of great things began with someone saying "How hard could it\\nbe?"<br /><br />This is one case where the young have an advantage. They\\\'re more'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1349744395573516, text=' audience\\nin the traditional sense. Either way it doesn\\\'t need to be big.\\nThe value of an audience doesn\\\'t grow anything like linearly with\\nits size. Which is bad news if you\\\'re famous, but good news if\\nyou\\\'re just starting out, because it means a small but dedicated\\naudience can be enough to sustain you. If a handful of people\\ngenuinely love what you\\\'re doing, that\\\'s enough.<br /><br />To the extent you can, avoid letting intermediaries come between\\nyou and your audience. In some types of work this is inevitable,\\nbut it\\\'s so liberating to escape it that you might be better off\\nswitching to an adjacent type if that will let you go direct.\\n<font color=#dddddd>[<a href="#f28n"><font color=#dddddd>28</font></a>]</font><br /><br />The people you spend time with will also have a big effect on your\\nmorale. You\\\'ll find there are some who increase your energy and\\nothers who decrease it, and the effect someone has is not always\\nwhat you\\\'d expect. Seek out the people who increase your energy and\\navoid those who decrease it. Though of course if there\\\'s someone\\nyou need to take care of, that takes precedence.<br /><br />Don\\\'t marry someone who doesn\\\'t understand that you need to work,\\nor sees your work as competition for your attention. If you\\\'re\\nambitious, you need to work; it\\\'s almost like a medical condition;\\nso someone who won\\\'t let you work either doesn\\\'t understand you,\\nor does and doesn\\\'t care.<br /><br />Ultimately morale is physical. You think with your body, so it\\\'s\\nimportant to take care of it. That means exercising regularly,\\neating and sleeping well, and avoiding the more dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.123214818076958, text='b\'<html><head><meta name="Keywords" content="" /><title>How to Do Great Work</title><!-- <META NAME="ROBOTS" CONTENT="NOODP"> -->\\n<link rel="shortcut icon" href="http://ycombinator.com/arc/arc.png">\\n</head><body bgcolor="#ffffff" background="https://s.turbifycdn.com/aah/paulgraham/bel-6.gif" text="#000000" link="#000099" vlink="#464646"><table border="0" cellspacing="0" cellpadding="0"><tr valign="top"><td><map name=118ab66adb24b4f><area shape=rect coords="0,0,67,21" href="index.html"><area shape=rect coords="0,21,67,42" href="articles.html"><area shape=rect coords="0,42,67,63" href="http://www.amazon.com/gp/product/0596006624"><area shape=rect coords="0,63,67,84" href="books.html"><area shape=rect coords="0,84,67,105" href="http://ycombinator.com"><area shape=rect coords="0,105,67,126" href="arc.html"><area shape=rect coords="0,126,67,147" href="bel.html"><area shape=rect coords="0,147,67,168" href="lisp.html"><area shape=rect coords="0,168,67,189" href="antispam.html"><area shape=rect coords="0,189,67,210" href="kedrosky.html"><area shape=rect coords="0,210,67,231" href="faq.html"><area shape=rect coords="0,231,67,252" href="raq.html"><area shape=rect coords="0,252,67,273" href="quo.html"><area shape=rect coords="0,273,67,294" href="rss.html"><area shape=rect coords="0,294,67,315" href="bio.html"><area shape=rect coords="0,315,67,336" href="https://twitter.com/paulg"><area shape=rect coords="0,336,67,357" href="https://mas.to/@paulg"></map><img src="https://s.turbifycdn.com/aah/paulgraham/bel-7.gif" width="69" height="357" usemap=#118ab66adb24b4f border="0" hspace="0" vspace="0" ismap /></td><td><img src="https://sep.turbifycdn.com/ca/Img/trans_1x1.gif" height="1" width="26" border="0" /></td><td><a href="index.html"><img src="https://s.turbifycdn.com/aah/paulgraham/bel-8.gif" width="410" height="45" border="0" hspace="0" vspace="0" /></a><br /><br /><table border="0" cellspacing="0" cellpadding="0" width="435"><tr valign="top"><td width="435"><img src="https://s.turbifycdn.com/aah/paulgraham/how-to-do-great-work-2.gif" width="185" height="18" border="0" hspace="0" vspace="0" alt="How to Do Great Work" /><br /><br /><font size="2" face="verdana">July 2023<br /><br />If you collected lists of techniques for doing great work in a lot\\nof different fields, what would the intersection look like? I decided\\nto find out'), Result(attributes={}, file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='file-a98ada68681c4fbeba2201e9c7213fc3', score=1.1193194369249235, text=' dangerous kinds of\\ndrugs. Running and walking are particularly good forms of exercise\\nbecause they\\\'re good for thinking.\\n<font color=#dddddd>[<a href="#f29n"><font color=#dddddd>29</font></a>]</font><br /><br />People who do great work are not necessarily happier than everyone\\nelse, but they\\\'re happier than they\\\'d be if they didn\\\'t. In fact,\\nif you\\\'re smart and ambitious, it\\\'s dangerous <i>not</i> to be productive.\\nPeople who are smart and ambitious but don\\\'t achieve much tend to\\nbecome bitter.<br /><br /><br /><br /><br /><br />\\nIt\\\'s ok to want to impress other people, but choose the right people.\\nThe opinion of people you respect is signal. Fame, which is the\\nopinion of a much larger group you might or might not respect, just\\nadds noise.<br /><br />The prestige of a type of work is at best a trailing indicator and\\nsometimes completely mistaken. If you do anything well enough,\\nyou\\\'ll make it prestigious. So the question to ask about a type of\\nwork is not how much prestige it has, but how well it could be done.<br /><br />Competition can be an effective motivator, but don\\\'t let it choose\\nthe problem for you; don\\\'t let yourself get drawn into chasing\\nsomething just because others are. In fact, don\\\'t let competitors\\nmake you do anything much more specific than work harder.<br /><br />Curiosity is the best guide. Your curiosity never lies, and it knows\\nmore than you do about what\\\'s worth paying attention to.<br /><br /><br /><br /><br /><br />\\nNotice how often that word has come up. If you asked an oracle the\\nsecret to doing great work and the oracle replied with a single\\nword, my bet would be on "curiosity."<br /><br />That doesn\\\'t translate directly to advice. It\\\'s not enough just to\\nbe curious, and you can\\\'t command curiosity anyway. But you can\\nnurture it and let it drive you.<br /><br />Curiosity is the key to all four steps in doing great work: it will\\nchoose the field for you, get you to the frontier, cause you to\\nnotice the gaps in it, and drive you to explore them. The whole\\nprocess is a kind of dance with curiosity.<br /><br /><br /><br /><br /><br />\\nBelieve it or not, I tried to make this essay as short as I could.\\nBut its length at least means it acts as a filter. If you made it\\nthis far, you must be interested in doing great work. And if so\\nyou\\\'re already further along than you might realize, because the\\nset of people willing to want to is small.<br /><br />The factors in doing great work are factors in the literal,\\nmathematical sense, and they are: ability, interest, effort, and\\nluck. Luck by definition you can\\\'t do anything about, so we can\\nignore that. And we can assume effort, if you do in fact want to\\ndo great work. So the problem boils down to ability and interest.\\nCan you find a kind of work where your ability and interest will\\ncombine to yield an explosion of new ideas?<br /><br />Here there are grounds for optimism. There are so many different\\nways to do great work, and even more that are still undiscovered.\\nOut of all those different types of work, the one you\\\'re most suited\\nfor is probably a pretty close match. Probably a comically close\\nmatch. It\\\'s just a question of finding it, and how far into it')]), ResponseOutputMessage(id='msg_3591ea71-8b35-4efd-a5ad-c1c250801971', content=[ResponseOutputText(annotations=[AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')], text='To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.', type='output_text', logprobs=None)], role='assistant', status='completed', type='message')], parallel_tool_calls=False, temperature=None, tool_choice=None, tools=None, top_p=None, background=None, conversation=None, max_output_tokens=None, max_tool_calls=None, previous_response_id=None, prompt=None, prompt_cache_key=None, reasoning=None, safety_identifier=None, service_tier=None, status='completed', text=ResponseTextConfig(format=ResponseFormatText(type='text'), verbosity=None), top_logprobs=None, truncation=None, usage=None, user=None) In [34]: resp.output[1].content[0].text Out[34]: 'To do great work, consider the following principles:\n\n1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too.\n\n2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements.\n\n3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you.\n\n4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale.\n\n5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor.\n\n6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights.\n\nBy focusing on these aspects, you can create an environment conducive to great work and personal fulfillment.' ``` </details> The relevant output looks like this: ```python >resp.output[1].content[0].annotations [AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=361, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=676, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=948, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1259, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1520, type='file_citation'), AnnotationFileCitation(file_id='file-a98ada68681c4fbeba2201e9c7213fc3', filename='https://www.paulgraham.com/greatwork.html', index=1747, type='file_citation')]``` And ```python In [144]: print(resp.output[1].content[0].text) To do great work, consider the following principles: 1. **Follow Your Interests**: Engage in work that genuinely excites you. If you find an area intriguing, pursue it without being overly concerned about external pressures or norms. You should create things that you would want for yourself, as this often aligns with what others in your circle might want too. 2. **Work Hard on Ambitious Projects**: Ambition is vital, but it should be tempered by genuine interest. Instead of detailed planning for the future, focus on exciting projects that keep your options open. This approach, known as "staying upwind," allows for adaptability and can lead to unforeseen achievements. 3. **Choose Quality Colleagues**: Collaborating with talented colleagues can significantly affect your own work. Seek out individuals who offer surprising insights and whom you admire. The presence of good colleagues can elevate the quality of your work and inspire you. 4. **Maintain High Morale**: Your attitude towards work and life affects your performance. Cultivating optimism and viewing yourself as lucky rather than victimized can boost your productivity. It’s essential to care for your physical health as well since it directly impacts your mental faculties and morale. 5. **Be Consistent**: Great work often comes from cumulative effort. Daily progress, even in small amounts, can result in substantial achievements over time. Emphasize consistency and make the work engaging, as this reduces the perceived burden of hard labor. 6. **Embrace Curiosity**: Curiosity is a driving force that can guide you in selecting fields of interest, pushing you to explore uncharted territories. Allow it to shape your work and continually seek knowledge and insights. By focusing on these aspects, you can create an environment conducive to great work and personal fulfillment. ``` And the code below outputs only periods highlighting that the position/index behaves as expected—i.e., the annotation happens at the end of the sentence. ```python print([resp.output[1].content[0].text[j.index] for j in resp.output[1].content[0].annotations]) Out[41]: ['.', '.', '.', '.', '.', '.'] ``` ## Test Plan Unit tests added. --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
6389bf5ffb
|
fix: make telemetry optional for agents (#3705)
# What does this PR do? there is a lot of code in the agents API using the telemetry API and its helpers without checking if that API is even enabled. This is the only API besides inference actively using telemetry code, so after this telemetry can be optional for the entire stack resolves #3665 ## Test Plan existing agent tests. Signed-off-by: Charlie Doern <cdoern@redhat.com> |
||
|
bba9957edd
|
feat(api): Add vector store file batches api (#3642)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2s
Python Package Build Test / build (3.13) (push) Failing after 0s
Python Package Build Test / build (3.12) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 9s
Unit Tests / unit-tests (3.12) (push) Failing after 3s
Test External API and Providers / test-external (venv) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 3s
UI Tests / ui-tests (22) (push) Successful in 40s
Pre-commit / pre-commit (push) Successful in 1m28s
# What does this PR do? Add Open AI Compatible vector store file batches api. This functionality is needed to attach many files to a vector store as a batch. https://github.com/llamastack/llama-stack/issues/3533 API Stubs have been merged https://github.com/llamastack/llama-stack/pull/3615 Adds persistence for file batches as discussed in diff https://github.com/llamastack/llama-stack/pull/3544 (Used claude code for generation and reviewed by me) ## Test Plan 1. Unit tests pass 2. Also verified the cc-vec integration with LLamaStackClient works with the file batches api. https://github.com/raghotham/cc-vec 2. Integration tests pass |
||
|
61b4238912
|
feat(api): add extra_body parameter support with shields example (#3670)
## Summary Introduce `ExtraBodyField` annotation to enable parameters that arrive via extra_body in client SDKs but are accessible server-side with full typing. These parameters are documented in OpenAPI specs under **`x-llama-stack-extra-body-params`** but excluded from generated SDK signatures. Add `shields` parameter to `create_openai_response` as the first implementation using this pattern. ## Test Plan - added an integration test which checks that shields parameter passed via extra_body reaches server implementation 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> |
||
|
d266c59c2a
|
chore: remove deprecated inference.chat_completion implementations (#3654)
# What does this PR do? remove unused chat_completion implementations vllm features ported - - requires max_tokens be set, use config value - set tool_choice to none if no tools provided ## Test Plan ci |
||
|
14a94e9894
|
fix: responses <> chat completion input conversion (#3645)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 1s
Python Package Build Test / build (3.12) (push) Failing after 2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 5s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
API Conformance Tests / check-schema-compatibility (push) Successful in 10s
Vector IO Integration Tests / test-matrix (push) Failing after 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 9s
Test External API and Providers / test-external (venv) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 5s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
UI Tests / ui-tests (22) (push) Successful in 33s
Pre-commit / pre-commit (push) Successful in 1m27s
# What does this PR do? closes #3268 closes #3498 When resuming from previous response ID, currently we attempt to convert from the stored responses input to chat completion messages, which is not always possible, e.g. for tool calls where some data is lost once converted from chat completion message to repsonses input format. This PR stores the chat completion messages that correspond to the _last_ call to chat completion, which is sufficient to be resumed from in the next responses API call, where we load these saved messages and skip conversion entirely. Separate issue to optimize storage: https://github.com/llamastack/llama-stack/issues/3646 ## Test Plan existing CI tests |
||
|
ef0736527d
|
feat(tools)!: substantial clean up of "Tool" related datatypes (#3627)
This is a sweeping change to clean up some gunk around our "Tool" definitions. First, we had two types `Tool` and `ToolDef`. The first of these was a "Resource" type for the registry but we had stopped registering tools inside the Registry long back (and only registered ToolGroups.) The latter was for specifying tools for the Agents API. This PR removes the former and adds an optional `toolgroup_id` field to the latter. Secondly, as pointed out by @bbrowning in https://github.com/llamastack/llama-stack/pull/3003#issuecomment-3245270132, we were doing a lossy conversion from a full JSON schema from the MCP tool specification into our ToolDefinition to send it to the model. There is no necessity to do this -- we ourselves aren't doing any execution at all but merely passing it to the chat completions API which supports this. By doing this (and by doing it poorly), we encountered limitations like not supporting array items, or not resolving $refs, etc. To fix this, we replaced the `parameters` field by `{ input_schema, output_schema }` which can be full blown JSON schemas. Finally, there were some types in our llama-related chat format conversion which needed some cleanup. We are taking this opportunity to clean those up. This PR is a substantial breaking change to the API. However, given our window for introducing breaking changes, this suits us just fine. I will be landing a concurrent `llama-stack-client` change as well since API shapes are changing. |
||
|
ceca3c056f
|
chore: fix/add logging categories (#3658)
# What does this PR do? These aren't controllable by LLAMA_STACK_LOGGING ``` tests/integration/agents/test_persistence.py::test_delete_agents_and_sessions SKIPPED (This ...) [ 3%] tests/integration/agents/test_persistence.py::test_get_agent_turns_and_steps SKIPPED (This t...) [ 7%] tests/integration/agents/test_openai_responses.py::test_responses_store[openai_client-txt=openai/gpt-4o-tools0-True] instantiating llama_stack_client WARNING 2025-10-02 13:14:33,472 root:258 uncategorized: Unknown logging category: testing. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,477 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,960 root:258 uncategorized: Unknown logging category: tokenizer_utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,962 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,963 root:258 uncategorized: Unknown logging category: models::llama. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,968 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,974 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:33,978 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,350 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,366 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,489 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,490 root:258 uncategorized: Unknown logging category: inference_store. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,697 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:35,918 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 INFO 2025-10-02 13:14:35,945 llama_stack.providers.utils.inference.inference_store:74 inference_store: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-02 13:14:36,172 root:258 uncategorized: Unknown logging category: files. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,218 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,219 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,231 root:258 uncategorized: Unknown logging category: vector_io. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,255 root:258 uncategorized: Unknown logging category: tool_runtime. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,486 root:258 uncategorized: Unknown logging category: responses_store. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,503 root:258 uncategorized: Unknown logging category: openai::responses. Falling back to default 'root' level: 20 INFO 2025-10-02 13:14:36,524 llama_stack.providers.utils.responses.responses_store:80 responses_store: Write queue disabled for SQLite to avoid concurrency issues WARNING 2025-10-02 13:14:36,528 root:258 uncategorized: Unknown logging category: providers::utils. Falling back to default 'root' level: 20 WARNING 2025-10-02 13:14:36,703 root:258 uncategorized: Unknown logging category: uncategorized. Falling back to default 'root' level: 20 ``` ## Test Plan |
||
|
7e48cc48bc
|
refactor(agents): migrate to OpenAI chat completions API (#3323)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 1s
Test Llama Stack Build / build-single-provider (push) Failing after 2s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 3s
Vector IO Integration Tests / test-matrix (push) Failing after 4s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 8s
Test External API and Providers / test-external (venv) (push) Failing after 4s
Unit Tests / unit-tests (3.12) (push) Failing after 4s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Python Package Build Test / build (3.13) (push) Failing after 14s
Test Llama Stack Build / generate-matrix (push) Successful in 18s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 4s
UI Tests / ui-tests (22) (push) Successful in 44s
Pre-commit / pre-commit (push) Successful in 1m16s
|
||
|
5adcf0e0cb
|
chore: Remove debug logging from telemetry adapter (#3643)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Spammy ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> n/a |
||
|
f7c5ef4ec0
|
chore: remove /v1/inference/completion and implementations (#3622)
# What does this PR do? the /inference/completion route is gone. this removes the implementations. ## Test Plan ci |
||
|
ea15f2a270
|
chore: use openai_chat_completion for llm as a judge scoring (#3635)
# What does this PR do? update llm as a judge to use openai_chat_completion, instead of deprecated chat_completion ## Test Plan ci |
||
|
ca47d90926
|
fix: Ensure that tool calls with no arguments get handled correctly (#3560)
# What does this PR do? When a model decides to use an MCP tool call that requires no arguments, it sets the `arguments` field to `None`. This causes the user to see a `400 bad requst error` due to validation errors down the stack because this field gets removed when being parsed by an openai compatible inference provider like vLLM This PR ensures that, as soon as the tool call args are accumulated while streaming, we check to ensure no tool call function arguments are set to None - if they are we replace them with "{}" <!-- If resolving an issue, uncomment and update the line below --> Closes #3456 ## Test Plan Added new unit test to verify that any tool calls with function arguments set to `None` get handled correctly --------- Signed-off-by: Jaideep Rao <jrao@redhat.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |