Commit graph

9 commits

Author SHA1 Message Date
Francisco Javier Arceo
6620b625f1 adding logo and favicon
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

chore: Enable keyword search for Milvus inline (#3073)

With https://github.com/milvus-io/milvus-lite/pull/294 - Milvus Lite
supports keyword search using BM25. While introducing keyword search we
had explicitly disabled it for inline milvus. This PR removes the need
for the check, and enables `inline::milvus` for tests.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

Run llama stack with `inline::milvus` enabled:

```
pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v
```

```
INFO     2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS
=========================================================================================== test session starts ============================================================================================
platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python
cachedir: .pytest_cache
metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}}
rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack
configfile: pyproject.toml
plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0
asyncio: mode=Mode.AUTO
collected 3 items

tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED                                                   [ 33%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED                                                  [ 66%]
tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED                                                   [100%]

============================================================================================ 3 passed in 4.75s =============================================================================================
```

Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com>
Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com>

chore: Fixup main pre commit (#3204)

build: Bump version to 0.2.18

chore: Faster npm pre-commit (#3206)

Adds npm to pre-commit.yml installation and caches ui
Removes node installation during pre-commit.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

chiecking in for tonight, wip moving to agents api

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

remove log

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

updated

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

fix: disable ui-prettier & ui-eslint (#3207)

chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061)

This PR adds a step in pre-commit to enforce using `llama_stack` logger.

Currently, various parts of the code base uses different loggers. As a
custom `llama_stack` logger exist and used in the codebase, it is better
to standardize its utilization.

Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>
Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu>

fix: fix ```openai_embeddings``` for asymmetric embedding NIMs (#3205)

NVIDIA asymmetric embedding models (e.g.,
`nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter
not present in the standard OpenAI embeddings API. This PR adds the
`input_type="query"` as default and updates the documentation to suggest
using the `embedding` API for passage embeddings.

<!-- If resolving an issue, uncomment and update the line below -->
Resolves #2892

```
pytest -s -v tests/integration/inference/test_openai_embeddings.py   --stack-config="inference=nvidia"   --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2"   --env NVIDIA_API_KEY={nvidia_api_key}   --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com"
```

cleaning up

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

updating session manager to cache messages locally

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

fix linter

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>

more cleanup

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-08-21 16:06:30 -04:00
Matthew Farrellee
914c7be288
feat: add batches API with OpenAI compatibility (with inference replay) (#3162)
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses #3066

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-08-15 15:34:15 -07:00
Ashwin Bharambe
ee7631b6cf
Revert "feat: add batches API with OpenAI compatibility" (#3149)
Reverts llamastack/llama-stack#3088

The PR broke integration tests.
2025-08-14 10:08:54 -07:00
Matthew Farrellee
de692162af
feat: add batches API with OpenAI compatibility (#3088)
Some checks failed
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Tests (Replay) / discover-tests (push) Successful in 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Python Package Build Test / build (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 23s
Python Package Build Test / build (3.13) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 25s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 28s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 29s
Unit Tests / unit-tests (3.12) (push) Failing after 20s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Test External API and Providers / test-external (venv) (push) Failing after 22s
Unit Tests / unit-tests (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Update ReadTheDocs / update-readthedocs (push) Failing after 38s
Pre-commit / pre-commit (push) Successful in 1m53s
Add complete batches API implementation with protocol, providers, and
tests:

Core Infrastructure:
- Add batches API protocol using OpenAI Batch types directly
- Add Api.batches enum value and protocol mapping in resolver
- Add OpenAI "batch" file purpose support
- Include proper error handling (ConflictError, ResourceNotFoundError)

Reference Provider:
- Add ReferenceBatchesImpl with full CRUD operations (create, retrieve,
cancel, list)
- Implement background batch processing with configurable concurrency
- Add SQLite KVStore backend for persistence
- Support /v1/chat/completions endpoint with request validation

Comprehensive Test Suite:
- Add unit tests for provider implementation with validation
- Add integration tests for end-to-end batch processing workflows
- Add error handling tests for validation, malformed inputs, and edge
cases

Configuration:
- Add max_concurrent_batches and max_concurrent_requests_per_batch
options
- Add provider documentation with sample configurations

Test with -

```
$ uv run llama stack build --image-type venv --providers inference=YOU_PICK,files=inline::localfs,batches=inline::reference --run &
$ LLAMA_STACK_CONFIG=http://localhost:8321 uv run pytest tests/unit/providers/batches tests/integration/batches --text-model YOU_PICK
```

addresses #3066
2025-08-14 09:42:02 -04:00
ehhuang
2f51273215
fix: huge speed boost (#3132)
# What does this PR do?
make llama stack fast again


## Test Plan
2025-08-13 09:51:35 -07:00
IAN MILLER
0950168f26
refactor: replace hardcoded status codes by httpx.codes (#3131)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to eliminate hardcoded status codes in
server's responses and replace it by `httpx.codes` functionality for
better consistency across the whole project and improvement in code
readability.

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 08:43:41 -07:00
IAN MILLER
c9b78602d3
refactor: modify DELETE API endpoints by returning HTTP 204 No Content + empty body instead of 200 OK + response body with null (#3112)
# What does this PR do?
<!-- Provide a short summary of what this PR does and why. Link to
relevant issues if applicable. -->
The purpose of this PR is to make the behavior DELETE API endpoints be
consistent with standard RESTful conventions and eliminate confusion for
API consumers.

Old Behavior
```
HTTP Status: 200 OK
Response Body: null
```

Eg. `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`null% `
`INFO 2025-08-12 16:11:57,932 console_span_processor:65 telemetry:
15:11:57.929 [INFO] ::1:59805 - "DELETE /v1/shields/test-shield
HTTP/1.1" 200 `

Updated Behavior
```
HTTP Status: 204 No Content
Response Body: empty (no body)
```

Eg.  `curl -X DELETE http://localhost:8321/v1/shields/test-shield`
`INFO 2025-08-12 16:18:16,645 console_span_processor:62 telemetry:
15:18:16.637 [INFO] ::1:60283 - "DELETE /v1/shields/test-shield
HTTP/1.1" 204 `

<!-- If resolving an issue, uncomment and update the line below -->
<!-- Closes #[issue-number] -->
Closes #3090 

## Test Plan
<!-- Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.* -->
Run `./scripts/unit-tests.sh`
2025-08-13 07:56:26 -07:00
Ashwin Bharambe
cc87995e2b
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out.

`llama stack build --template` --> `llama stack build --distro`. For
backward compatibility, the previous option is kept but results in a
warning.

Updated `server.py` to remove the "config_or_template" backward
compatibility since it has been a couple releases since that change.
2025-08-04 11:34:17 -07:00
Ashwin Bharambe
2665f00102
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To
prepare for that, this is a precursor.

cc @leseb
2025-07-30 23:30:53 -07:00
Renamed from llama_stack/distribution/server/server.py (Browse further)