Commit graph

9 commits

Author SHA1 Message Date
Charlie Doern
a078f089d9
fix: rename llama_stack_api dir (#4155)
Some checks failed
Integration Tests (Replay) / generate-matrix (push) Successful in 3s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 0s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 0s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.12) (push) Failing after 4s
API Conformance Tests / check-schema-compatibility (push) Successful in 12s
Test llama stack list-deps / generate-matrix (push) Successful in 29s
Test Llama Stack Build / build-single-provider (push) Successful in 33s
Test llama stack list-deps / list-deps-from-config (push) Successful in 32s
UI Tests / ui-tests (22) (push) Successful in 39s
Test Llama Stack Build / build (push) Successful in 39s
Test llama stack list-deps / show-single-provider (push) Successful in 46s
Python Package Build Test / build (3.13) (push) Failing after 44s
Test External API and Providers / test-external (venv) (push) Failing after 44s
Vector IO Integration Tests / test-matrix (push) Failing after 56s
Test llama stack list-deps / list-deps (push) Failing after 47s
Unit Tests / unit-tests (3.12) (push) Failing after 1m42s
Unit Tests / unit-tests (3.13) (push) Failing after 1m55s
Test Llama Stack Build / build-ubi9-container-distribution (push) Successful in 2m0s
Test Llama Stack Build / build-custom-container-distribution (push) Successful in 2m2s
Integration Tests (Replay) / Integration Tests (, , , client=, ) (push) Failing after 2m42s
Pre-commit / pre-commit (push) Successful in 5m17s
# What does this PR do?

the directory structure was src/llama-stack-api/llama_stack_api

instead it should just be src/llama_stack_api to match the other
packages.

update the structure and pyproject/linting config

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-11-13 15:04:36 -08:00
Charlie Doern
840ad75fe9
feat: split API and provider specs into separate llama-stack-api pkg (#3895)
# What does this PR do?

Extract API definitions and provider specifications into a standalone
llama-stack-api package that can be published to PyPI independently of
the main llama-stack server.


see: https://github.com/llamastack/llama-stack/pull/2978 and
https://github.com/llamastack/llama-stack/pull/2978#issuecomment-3145115942

Motivation

External providers currently import from llama-stack, which overrides
the installed version and causes dependency conflicts. This separation
allows external providers to:

- Install only the type definitions they need without server
dependencies
- Avoid version conflicts with the installed llama-stack package
- Be versioned and released independently

This enables us to re-enable external provider module tests that were
previously blocked by these import conflicts.

Changes

- Created llama-stack-api package with minimal dependencies (pydantic,
jsonschema)
- Moved APIs, providers datatypes, strong_typing, and schema_utils
- Updated all imports from llama_stack.* to llama_stack_api.*
- Configured local editable install for development workflow
- Updated linting and type-checking configuration for both packages

Next Steps

- Publish llama-stack-api to PyPI
- Update external provider dependencies
- Re-enable external provider module tests


Pre-cursor PRs to this one:

- #4093 
- #3954 
- #4064 

These PRs moved key pieces _out_ of the Api pkg, limiting the scope of
change here.


relates to #3237 

## Test Plan

Package builds successfully and can be imported independently. All
pre-commit hooks pass with expected exclusions maintained.

---------

Signed-off-by: Charlie Doern <cdoern@redhat.com>
2025-11-13 11:51:17 -08:00
Francisco Arceo
eb3f9ac278
feat: allow returning embeddings and metadata from /vector_stores/ methods; disallow changing Provider ID (#4046)
# What does this PR do?

- Updates `/vector_stores/{vector_store_id}/files/{file_id}/content` to
allow returning `embeddings` and `metadata` using the `extra_query`
    -  Updates the UI accordingly to display them.

- Update UI to support CRUD operations in the Vector Stores section and
adds a new modal exposing the functionality.

- Updates Vector Store update to fail if a user tries to update Provider
ID (which doesn't make sense to allow)

```python
In  [1]: client.vector_stores.files.content(
    vector_store_id=vector_store.id, 
    file_id=file.id, 
    extra_query={"include_embeddings": True, "include_metadata": True}
)
Out [1]: FileContentResponse(attributes={}, content=[Content(text='This is a test document to check if embeddings are generated properly.\n', type='text', embedding=[0.33760684728622437, ...,], chunk_metadata={'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'source': None, 'created_timestamp': 1762053437, 'updated_timestamp': 1762053437, 'chunk_window': '0-13', 'chunk_tokenizer': 'DEFAULT_TIKTOKEN_TOKENIZER', 'chunk_embedding_model': 'sentence-transformers/nomic
-ai/nomic-embed-text-v1.5', 'chunk_embedding_dimension': 768, 'content_token_count': 13, 'metadata_token_count': 9}, metadata={'filename': 'test-embedding.txt', 'chunk_id': '62a63ae0-c202-f060-1b86-0a688995b8d3', 'document_id': 'file-27291dbc679642ac94ffac6d2810c339', 'token_count': 13, 'metadata_token_count': 9})], file_id='file-27291dbc679642ac94ffac6d2810c339', filename='test-embedding.txt')
```

Screenshots of UI are displayed below:

### List Vector Store with Added "Create New Vector Store"
<img width="1912" height="491" alt="Screenshot 2025-11-06 at 10 47
25 PM"
src="https://github.com/user-attachments/assets/a3a3ddd9-758d-4005-ac9c-5047f03916f3"
/>

### Create New Vector Store
<img width="1918" height="1048" alt="Screenshot 2025-11-06 at 10 47
49 PM"
src="https://github.com/user-attachments/assets/b4dc0d31-696f-4e68-b109-27915090f158"
/>

### Edit Vector Store
<img width="1916" height="1355" alt="Screenshot 2025-11-06 at 10 48
32 PM"
src="https://github.com/user-attachments/assets/ec879c63-4cf7-489f-bb1e-57ccc7931414"
/>


### Vector Store Files Contents page (with Embeddings)
<img width="1914" height="849" alt="Screenshot 2025-11-06 at 11 54
32 PM"
src="https://github.com/user-attachments/assets/3095520d-0e90-41f7-83bd-652f6c3fbf27"
/>

### Vector Store Files Contents Details page (with Embeddings)
<img width="1916" height="1221" alt="Screenshot 2025-11-06 at 11 55
00 PM"
src="https://github.com/user-attachments/assets/e71dbdc5-5b49-472b-a43a-5785f58d196c"
/>

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
Tests added for Middleware extension and Provider failures.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-11-12 09:59:48 -08:00
Ashwin Bharambe
7b90e0e9c8
test: suppress expected error logs in SSE test (#3886)
Our unit test outputs are filled with all kinds of obscene logs. This
makes it really hard to spot real issues quickly. The problem is that
these logs are necessary to output at the given logging level when the
server is operating normally. It's just that we don't want to see some
of them (especially the noisy ones) during tests.

This PR begins the cleanup. We pytest's caplog fixture to for
suppression.
2025-10-22 14:34:32 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Matthew Farrellee
30b2e6a495
chore: default to pytest asyncio-mode=auto (#2730)
# What does this PR do?

previously, developers who ran `./scripts/unit-tests.sh` would get
`asyncio-mode=auto`, which meant `@pytest.mark.asyncio` and
`@pytest_asyncio.fixture` were redundent. developers who ran `pytest`
directly would get pytest's default (strict mode), would run into errors
leading them to add `@pytest.mark.asyncio` / `@pytest_asyncio.fixture`
to their code.

with this change -
- `asyncio_mode=auto` is included in `pyproject.toml` making behavior
consistent for all invocations of pytest
- removes all redundant `@pytest_asyncio.fixture` and
`@pytest.mark.asyncio`
 - for good measure, requires `pytest>=8.4` and `pytest-asyncio>=1.0`

## Test Plan

- `./scripts/unit-tests.sh`
- `uv run pytest tests/unit`
2025-07-11 13:00:24 -07:00
Rohan Awhad
436c7aa751
feat: Add url field to PaginatedResponse and populate it using route … (#2419)
Some checks failed
Integration Tests / test-matrix (library, 3.11, inference) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.10, inspect) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, post_training) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.10, scoring) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, agents) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, providers) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.10, vector_io) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, post_training) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.11, providers) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.11, scoring) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.11, tool_runtime) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.11, vector_io) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, agents) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, inspect) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, datasets) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, post_training) (push) Failing after 7s
Integration Tests / test-matrix (library, 3.12, inference) (push) Failing after 12s
Integration Tests / test-matrix (library, 3.12, scoring) (push) Failing after 8s
Integration Tests / test-matrix (library, 3.12, providers) (push) Failing after 9s
Integration Tests / test-matrix (library, 3.12, vector_io) (push) Failing after 10s
Test External Providers / test-external-providers (venv) (push) Failing after 10s
Integration Tests / test-matrix (library, 3.12, tool_runtime) (push) Failing after 15s
Unit Tests / unit-tests (3.11) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Unit Tests / unit-tests (3.12) (push) Failing after 58s
Unit Tests / unit-tests (3.10) (push) Failing after 1m0s
Pre-commit / pre-commit (push) Successful in 2m10s
…path


# What does this PR do?

Closes #1847 

Changes:
- llama_stack/apis/common/responses.py: adds optional `url` field to
PaginatedResponse
- llama_stack/distribution/server/server.py: automatically populate the
URL field with route path



## Test Plan

- Built and ran llama stack server using the following cmds:  
  ```bash  
  export INFERENCE_MODEL=llama3.1:8b  
  llama stack build --run --template ollama --image-type container
  llama stack run llama_stack/templates/ollama/run.yaml
  ```
- Ran `curl` to test if we are seeing the `url` param in response:
  ```bash
  curl -X 'GET' \
    'http://localhost:8321/v1/agents' \
    -H 'accept: application/json'
  ```
- Expected and Received Output:
`{"data":[],"has_more":false,"url":"/v1/agents"}`

---------

Co-authored-by: Rohan Awhad <rawhad@redhat.com>
2025-06-16 11:19:48 +02:00
Ben Browning
0b6cd45950
fix: Additional streaming error handling (#2007)
# What does this PR do?

This expands the `test_sse` test suite and fixes some edge cases with
bugs in our SSE error handling to ensure streaming clients always get a
proper error response.

First, we handle the case where a client disconnects before we actually
start streaming the response back. Previously we only handled the case
where a client disconnected as we were streaming the response, but there
was an edge case where a client disconnecting before we streamed any
response back did not trigger our logic to cleanly handle that
disconnect.

Second, we handle the case where an error is thrown from the server
before the actual async generator gets created from the provider. This
happens in scenarios like the newly merged OpenAI API input validation,
where we eagerly raise validation errors before returning the async
generator object that streams the responses back.

## Test Plan

Tested via:

```
python -m pytest -s -v tests/unit/server/test_sse.py
```

Both test cases failed before, and passed afterwards. The test cases
were written based on me experimenting with actual clients that would do
bad things like randomly disconnect or send invalid input in streaming
mode and I hit these two cases, where things were misbehaving in our
error handling.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-24 17:01:45 -07:00
Ben Browning
dc46725f56
fix: properly handle streaming client disconnects (#2000)
# What does this PR do?

Previously, when a streaming client would disconnect before we were
finished streaming the entire response, an error like the below would
get raised from the `sse_generator` function in
`llama_stack/distribution/server/server.py`:

```
AttributeError: 'coroutine' object has no attribute 'aclose'. Did you mean: 'close'?
```

This was because we were calling `aclose` on a coroutine instead of the
awaited value from that coroutine. This change fixes that, so that we
save off the awaited value and then can call `aclose` on it if we
encounter an `asyncio.CancelledError`, like we see when a client
disconnects before we're finished streaming.

The other changes in here are to add a simple set of tests for the happy
path of our SSE streaming and this client disconnect path.

That unfortunately requires adding one more dependency into our unit
test section of pyproject.toml since `server.py` requires loading some
of the telemetry code for me to test this functionality.

## Test Plan

I wrote the tests in `tests/unit/server/test_sse.py` first, verified the
client disconnected test failed before my change, and that it passed
afterwards.

```
python -m pytest -s -v tests/unit/server/test_sse.py
```

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-04-23 15:44:28 +02:00