mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-10-04 04:04:14 +00:00
336 commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
3d119a86d4
|
chore: indicate to mypy that InferenceProvider.batch_completion/batch_chat_completion is concrete (#3239)
# What does this PR do? closes https://github.com/llamastack/llama-stack/issues/3236 mypy considered our default implementations (raise NotImplementedError) to be trivial. the result was we implemented the same stubs in providers. this change puts enough into the default impls so mypy considers them non-trivial. this allows us to remove the duplicate implementations. |
||
|
2ee898cc4c
|
chore: indicate to mypy that InferenceProvider.rerank is concrete (#3238) | ||
|
f520e244d9
|
feat: Add S3 Files Provider (#3202)
Implements a complete S3-based file storage provider for Llama Stack with: Core Implementation: - S3FilesImpl class with full OpenAI Files API compatibility - Support for file upload, download, listing, deletion operations - Sqlite-based metadata storage for fast queries and API compliance - Configurable S3 endpoints (AWS, MinIO, LocalStack support) Key Features: - Automatic S3 bucket creation and management - Metadata persistence - Proper error handling for S3 connectivity and permissions Dependencies: - Adds boto3 for AWS S3 integration - Adds moto[s3] for testing infrastructure Testing: Unit: `./scripts/unit-tests.sh tests/unit/files tests/unit/providers/files` Integration: Start MinIO: `podman run --rm -it -p 9000:9000 minio/minio server /data` Start stack w/ S3 provider: `S3_ENDPOINT_URL=http://localhost:9000 AWS_ACCESS_KEY_ID=minioadmin AWS_SECRET_ACCESS_KEY=minioadmin S3_BUCKET_NAME=llama-stack-files uv run llama stack build --image-type venv --providers files=remote::s3 --run` Run integration tests: `./scripts/integration-tests.sh --stack-config http://localhost:8321 --provider ollama --test-subdirs files` |
||
|
c5e2e269e2
|
feat(api): introduce /rerank (#2940)
Some checks failed
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (push) Failing after 6s
Pre-commit / pre-commit (push) Failing after 7s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 8s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Test External API and Providers / test-external (venv) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 21s
Test Llama Stack Build / generate-matrix (push) Failing after 21s
Test Llama Stack Build / build (push) Has been skipped
UI Tests / ui-tests (22) (push) Failing after 21s
# What does this PR do? Context: https://github.com/meta-llama/llama-stack/issues/2937 The API design is inspired by existing offerings, but not exactly the same: * `top_n` as the parameter to control number of results, instead of `top_k`, since `n` is conventional to control number * `truncation` bool instead of `max_token_per_doc`, since we should just handle the truncation automatically depending on model capability, instead of user setting the context length manually. * `data` field in the response, to be consistent with other OpenAI APIs (though they don't have a rerank API). Also, it is one less name to learn in the API. ## Test Plan Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
c3b2b06974
|
refactor(logging): rename llama_stack logger categories (#3065)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR renames categories of llama_stack loggers. This PR aligns logging categories as per the package name, as well as reviews from initial https://github.com/meta-llama/llama-stack/pull/2868. This is a follow up to #3061. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Replaces https://github.com/meta-llama/llama-stack/pull/2868 Part of https://github.com/meta-llama/llama-stack/issues/2865 cc @leseb @rhuss Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> |
||
|
deffaa9e4e
|
fix: fix the error type in embedding test case (#3197)
# What does this PR do? Currently the embedding integration test cases fail due to a misalignment in the error type. This PR fixes the embedding integration test by fixing the error type. ## Test Plan ``` pytest -s -v tests/integration/inference/test_embedding.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ``` |
||
|
b72169ca47
|
docs: update the docs for NVIDIA Inference provider (#3227)
# What does this PR do? - Documentation update and fix for the NVIDIA Inference provider. - Update the `run_moderation` for safety API with a `NotImplementedError` placeholder. Otherwise initialization NVIDIA inference client will raise an error. ## Test Plan N/A |
||
|
55e9959f62
|
fix: fix ``openai_embeddings `` for asymmetric embedding NIMs (#3205)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Python Package Build Test / build (3.13) (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 9s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Python Package Build Test / build (3.12) (push) Failing after 49s
Test Llama Stack Build / build (push) Failing after 54s
UI Tests / ui-tests (22) (push) Failing after 1m26s
Pre-commit / pre-commit (push) Successful in 2m24s
# What does this PR do? NVIDIA asymmetric embedding models (e.g., `nvidia/llama-3.2-nv-embedqa-1b-v2`) require an `input_type` parameter not present in the standard OpenAI embeddings API. This PR adds the `input_type="query"` as default and updates the documentation to suggest using the `embedding` API for passage embeddings. <!-- If resolving an issue, uncomment and update the line below --> Resolves #2892 ## Test Plan ``` pytest -s -v tests/integration/inference/test_openai_embeddings.py --stack-config="inference=nvidia" --embedding-model="nvidia/llama-3.2-nv-embedqa-1b-v2" --env NVIDIA_API_KEY={nvidia_api_key} --env NVIDIA_BASE_URL="https://integrate.api.nvidia.com" ``` |
||
|
3f8df167f3
|
chore(pre-commit): add pre-commit hook to enforce llama_stack logger usage (#3061)
# What does this PR do? This PR adds a step in pre-commit to enforce using `llama_stack` logger. Currently, various parts of the code base uses different loggers. As a custom `llama_stack` logger exist and used in the codebase, it is better to standardize its utilization. Signed-off-by: Mustafa Elbehery <melbeher@redhat.com> Co-authored-by: Matthew Farrellee <matt@cs.wisc.edu> |
||
|
8cc4925f7d
|
chore: Enable keyword search for Milvus inline (#3073)
# What does this PR do? With https://github.com/milvus-io/milvus-lite/pull/294 - Milvus Lite supports keyword search using BM25. While introducing keyword search we had explicitly disabled it for inline milvus. This PR removes the need for the check, and enables `inline::milvus` for tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Run llama stack with `inline::milvus` enabled: ``` pytest tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes --stack-config=http://localhost:8321 --embedding-model=all-MiniLM-L6-v2 -v ``` ``` INFO 2025-08-07 17:06:20,932 tests.integration.conftest:64 tests: Setting DISABLE_CODE_SANDBOX=1 for macOS =========================================================================================== test session starts ============================================================================================ platform darwin -- Python 3.12.11, pytest-7.4.4, pluggy-1.5.0 -- /Users/vnarsing/miniconda3/envs/stack-client/bin/python cachedir: .pytest_cache metadata: {'Python': '3.12.11', 'Platform': 'macOS-14.7.6-arm64-arm-64bit', 'Packages': {'pytest': '7.4.4', 'pluggy': '1.5.0'}, 'Plugins': {'asyncio': '0.23.8', 'cov': '6.0.0', 'timeout': '2.2.0', 'socket': '0.7.0', 'html': '3.1.1', 'langsmith': '0.3.39', 'anyio': '4.8.0', 'metadata': '3.0.0'}} rootdir: /Users/vnarsing/go/src/github/meta-llama/llama-stack configfile: pyproject.toml plugins: asyncio-0.23.8, cov-6.0.0, timeout-2.2.0, socket-0.7.0, html-3.1.1, langsmith-0.3.39, anyio-4.8.0, metadata-3.0.0 asyncio: mode=Mode.AUTO collected 3 items tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-vector] PASSED [ 33%] tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-keyword] PASSED [ 66%] tests/integration/vector_io/test_openai_vector_stores.py::test_openai_vector_store_search_modes[None-None-all-MiniLM-L6-v2-None-384-hybrid] PASSED [100%] ============================================================================================ 3 passed in 4.75s ============================================================================================= ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
fffdab4f5c
|
fix: Dell distribution missing kvstore (#3113)
Some checks failed
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 7s
Integration Tests (Replay) / discover-tests (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 16s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
Test Llama Stack Build / build-single-provider (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 24s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 29s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 15s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 14s
Python Package Build Test / build (3.12) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 13s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 11s
Test Llama Stack Build / build (push) Failing after 8s
Unit Tests / unit-tests (3.13) (push) Failing after 37s
Pre-commit / pre-commit (push) Successful in 1m44s
# What does this PR do? - Added kvstore config to ChromaDB provider config for Dell distribution similar to [starter config](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/distributions/starter/run.yaml#L110-L112) - Fixed [error](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/inference/_generated/_async_client.py#L3424-L3425) getting endpoint information by adding `hf-inference` as the provider to the `AsyncInferenceClient` (TGI client). ## Test Plan ``` export INFERENCE_PORT=8181 export DEH_URL=http://0.0.0.0:$INFERENCE_PORT export INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct export CHROMADB_HOST=localhost export CHROMADB_PORT=8000 export CHROMA_URL=http://$CHROMADB_HOST:$CHROMADB_PORT export CUDA_VISIBLE_DEVICES=0 export LLAMA_STACK_PORT=8321 export HF_TOKEN=[redacted] # TGI Server docker run --rm -it \ --pull always \ --network host \ -v $HOME/.cache/huggingface:/data \ -e HF_TOKEN=$HF_TOKEN \ -e PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True \ -p $INFERENCE_PORT:$INFERENCE_PORT \ --gpus all \ ghcr.io/huggingface/text-generation-inference:latest \ --dtype float16 \ --usage-stats off \ --sharded false \ --cuda-memory-fraction 0.8 \ --model-id meta-llama/Llama-3.2-3B-Instruct \ --port $INFERENCE_PORT \ --hostname 0.0.0.0 # Chrome DB docker run --rm -it \ --name chromadb \ --net=host -p 8000:8000 \ -v ~/chroma:/chroma/chroma \ -e IS_PERSISTENT=TRUE \ -e ANONYMIZED_TELEMETRY=FALSE \ chromadb/chroma:latest # Llama Stack llama stack run dell \ --port $LLAMA_STACK_PORT \ --env INFERENCE_MODEL=$INFERENCE_MODEL \ --env DEH_URL=$DEH_URL \ --env CHROMA_URL=$CHROMA_URL ``` --------- Co-authored-by: Connor Hack <connorhack@fb.com> Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com> |
||
|
3d90117891
|
chore(tests): fix responses and vector_io tests (#3119)
Some fixes to MCP tests. And a bunch of fixes for Vector providers. I also enabled a bunch of Vector IO tests to be used with `LlamaStackLibraryClient` ## Test Plan Run Responses tests with llama stack library client: ``` pytest -s -v tests/integration/non_ci/responses/ --stack-config=server:starter \ --text-model openai/gpt-4o \ --embedding-model=sentence-transformers/all-MiniLM-L6-v2 \ -k "client_with_models" ``` Do the same with `-k openai_client` The rest should be taken care of by CI. |
||
|
19123ca957
|
refactor: standardize InferenceRouter model handling (#2965)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Python Package Build Test / build (3.13) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Test External API and Providers / test-external (venv) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 25s
Unit Tests / unit-tests (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 21s
Unit Tests / unit-tests (3.13) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 24s
Pre-commit / pre-commit (push) Successful in 1m19s
|
||
|
a4bad6c0b4
|
feat: Add Google Vertex AI inference provider support (#2841)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 10s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test External API and Providers / test-external (venv) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 15s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 47s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Unit Tests / unit-tests (3.13) (push) Failing after 39s
Pre-commit / pre-commit (push) Successful in 1m37s
# What does this PR do? - Add new Vertex AI remote inference provider with litellm integration - Support for Gemini models through Google Cloud Vertex AI platform - Uses Google Cloud Application Default Credentials (ADC) for authentication - Added VertexAI models: gemini-2.5-flash, gemini-2.5-pro, gemini-2.0-flash. - Updated provider registry to include vertexai provider - Updated starter template to support Vertex AI configuration - Added comprehensive documentation and sample configuration <!-- If resolving an issue, uncomment and update the line below --> relates to https://github.com/meta-llama/llama-stack/issues/2747 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Eran Cohen <eranco@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
1677d6bffd
|
feat: Flash-Lite 2.0 and 2.5 models added to Gemini inference provider (#3058)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 4s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Python Package Build Test / build (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 15s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.13) (push) Failing after 10s
Unit Tests / unit-tests (3.12) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 13s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 19s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Unit Tests / unit-tests (3.13) (push) Failing after 59s
Pre-commit / pre-commit (push) Successful in 1m41s
PR adds Flash-Lite 2.0 and 2.5 models to the Gemini inference provider Closes #3046 ## Test Plan I was not able to locate any existing test for this provider, so I performed manual testing. But the change is really trivial and straightforward. |
||
|
9e78f2da96
|
docs: fix the docs for NVIDIA Inference Provider (#3055)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Test Llama Stack Build / generate-matrix (push) Successful in 14s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 20s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 26s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 16s
Test External API and Providers / test-external (venv) (push) Failing after 11s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 20s
Python Package Build Test / build (3.12) (push) Failing after 23s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 21s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 17s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 58s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 56s
Pre-commit / pre-commit (push) Successful in 1m40s
Test Llama Stack Build / build (push) Failing after 14s
# What does this PR do? Fix the NVIDIA inference docs by updating API methods, model IDs, and embedding example. ## Test Plan N/A |
||
|
e3928e6a29
|
feat: Implement hybrid search in Milvus (#2644)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 10s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Python Package Build Test / build (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 10s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 8s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 11s
Test External API and Providers / test-external (venv) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Pre-commit / pre-commit (push) Successful in 57s
# What does this PR do? This PR implements hybrid search for Milvus DB based on the inbuilt milvus support. To test: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=long --disable-warnings --asyncio-mode=auto ``` Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> |
||
|
7f834339ba
|
chore(misc): make tests and starter faster (#3042)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 9s
Python Package Build Test / build (3.12) (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 12s
Test Llama Stack Build / generate-matrix (push) Successful in 11s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Test External API and Providers / test-external (venv) (push) Failing after 14s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 14s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 18s
Test Llama Stack Build / build (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Python Package Build Test / build (3.13) (push) Failing after 53s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 59s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 1m1s
Update ReadTheDocs / update-readthedocs (push) Failing after 1m6s
Pre-commit / pre-commit (push) Successful in 1m53s
A bunch of miscellaneous cleanup focusing on tests, but ended up speeding up starter distro substantially. - Pulled llama stack client init for tests into `pytest_sessionstart` so it does not clobber output - Profiling of that told me where we were doing lots of heavy imports for starter, so lazied them - starter now starts 20seconds+ faster on my Mac - A few other smallish refactors for `compat_client` |
||
|
e12524af85
|
feat: create unregister shield API endpoint in Llama Stack (#2853)
Some checks failed
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Integration Tests (Replay) / discover-tests (push) Successful in 13s
Python Package Build Test / build (3.12) (push) Failing after 1s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s
Test External API and Providers / test-external (venv) (push) Failing after 12s
Unit Tests / unit-tests (3.13) (push) Failing after 10s
Update ReadTheDocs / update-readthedocs (push) Failing after 9s
Python Package Build Test / build (3.13) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s
Unit Tests / unit-tests (3.12) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s
Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m21s
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Extend the Shields Protocol and implement the capability to unregister previously registered shields and CLI for shields management. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2581 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> First of, test API for shields 1. Install and start Ollama: `ollama serve` 2. Pull Llama Guard Model in Ollama: `ollama pull llama-guard3:8b` 3. Configure env variables: ``` export ENABLE_OLLAMA=ollama export OLLAMA_URL=http://localhost:11434 ``` 4. Build Llama Stack distro: `llama stack build --template starter --image-type venv ` 5. Start Llama Stack server: `llama stack run starter --port 8321` 6. Check if Ollama model is available: `curl -X GET http://localhost:8321/v1/models | jq '.data[] | select(.provider_id=="ollama")'` 7. Register a new Shield using Ollama provider: ``` curl -X POST http://localhost:8321/v1/shields \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "provider_id": "llama-guard", "provider_shield_id": "ollama/llama-guard3:8b", "params": {} }' ``` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 8. Check if shield was registered: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 9. Run shield: ``` curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "messages": [ { "role": "user", "content": "How can I hack into someone computer?" } ], "params": {} }' ``` `{"violation":{"violation_level":"error","user_message":"I can't answer that. Can I help with something else?","metadata":{"violation_type":"S2"}}}% ` 10. Unregister shield: `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` 11. Verify shield was deleted: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"detail":"Invalid value: Shield 'test-shield' not found"}%` All tests passed ✅ ``` ========================================================================== 430 passed, 194 warnings in 19.54s ========================================================================== /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited loop.close() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Wrote HTML report to htmlcov-3.12/index.html ``` |
||
|
cc87995e2b
|
chore: rename templates to distributions (#3035)
As the title says. Distributions is in, Templates is out. `llama stack build --template` --> `llama stack build --distro`. For backward compatibility, the previous option is kept but results in a warning. Updated `server.py` to remove the "config_or_template" backward compatibility since it has been a couple releases since that change. |
||
|
3c2aee610d
|
refactor: Remove double filtering based on score threshold (#3019)
# What does this PR do? Remove score_threshold based check from `OpenAIVectorStoreMixin` Closes: https://github.com/meta-llama/llama-stack/issues/3018 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |
||
|
a749d5f4a4
|
refactor: remove Conda support from Llama Stack (#2969)
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR is responsible for removal of Conda support in Llama Stack <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2539 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> |
||
|
140ee7d337
|
fix: sambanova inference provider (#2996)
Some checks failed
Integration Tests (Replay) / discover-tests (push) Successful in 3s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 10s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 14s
Python Package Build Test / build (3.13) (push) Failing after 8s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 15s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 12s
Python Package Build Test / build (3.12) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 10s
Test External API and Providers / test-external (venv) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 10s
Unit Tests / unit-tests (3.13) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 46s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 49s
Pre-commit / pre-commit (push) Successful in 1m29s
# What does this PR do? closes #2995 update SambaNovaInferenceAdapter to efficiently use LiteLLMOpenAIMixin ## Test Plan ``` $ uv run pytest -s -v tests/integration/inference --stack-config inference=sambanova --text-model sambanova/Meta-Llama-3.1-8B-Instruct ... ======================== 10 passed, 84 skipped, 3 xfailed, 51 warnings in 8.14s ======================== ``` |
||
|
1f0766308d
|
feat: Add openAI compatible APIs to Qdrant (#2465)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 15s
Test Llama Stack Build / generate-matrix (push) Successful in 9s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-single-provider (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 14s
Integration Tests (Replay) / discover-tests (push) Successful in 24s
Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 16s
Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 15s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 18s
Update ReadTheDocs / update-readthedocs (push) Failing after 12s
Unit Tests / unit-tests (3.12) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 20s
Python Package Build Test / build (3.13) (push) Failing after 18s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 18s
Test External API and Providers / test-external (venv) (push) Failing after 18s
Unit Tests / unit-tests (3.13) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 42s
Integration Tests (Replay) / run-replay-mode-tests (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 1m12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 1m15s
Test Llama Stack Build / build (push) Failing after 32s
Pre-commit / pre-commit (push) Successful in 2m39s
# What does this PR do? Adds support to Vector store Open AI APIs in Qdrant. <!-- If resolving an issue, uncomment and update the line below --> Closes #2463 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> Co-authored-by: ehhuang <ehhuang@users.noreply.github.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
33cca26154
|
chore: Enabling Integration tests for Weaviate (#2882)
# What does this PR do? This PR (1) enables the files API for Weaviate and (2) enables integration tests for Weaviate, which adds a docker container to the github action. This PR also handles a couple of edge cases for in creating the collection and ensuring the tests all pass. ## Test Plan CI enabled --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
2665f00102
|
chore(rename): move llama_stack.distribution to llama_stack.core (#2975)
We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb |
||
|
cd5c6a2fcd
|
chore: standardize vector store not found error (#2968)
# What does this PR do? 1. Creates a new `VectorStoreNotFoundError` class 2. Implements the new class where appropriate Relates to #2379 Signed-off-by: Nathan Weinberg <nweinber@redhat.com> |
||
|
60bb5e307e
|
feat(openai): add configurable base_url support with OPENAI_BASE_URL env var (#2919)
# What does this PR do? - Add base_url field to OpenAIConfig with default "https://api.openai.com/v1" - Update sample_run_config to support OPENAI_BASE_URL environment variable - Modify get_base_url() to return configured base_url instead of hardcoded value - Add comprehensive test suite covering: - Default base URL behavior - Custom base URL from config - Environment variable override - Config precedence over environment variables - Client initialization with configured URL - Model availability checks using configured URL This enables users to configure custom OpenAI-compatible API endpoints via environment variables or configuration files. Closes #2910 ## Test Plan run unit tests |
||
|
3c40c8e583
|
fix: litellm_provider_name for llama-api (#2934)
litellm uses "meta_llama" for the provider name, see https://docs.litellm.ai/docs/providers/meta_llama ad https://github.com/BerriAI/litellm/blob/main/litellm/__init__.py#L833 |
||
|
9583f468f8
|
feat(starter)!: simplify starter distro; litellm model registry changes (#2916) | ||
|
52201612de
|
feat: implement chunk deletion for vector stores (#2701)
Add support for deleting individual chunks from vector stores - Add abstract remove_chunk() method to EmbeddingIndex base class - Implement chunk deletion for Faiss provider, SQLite Vec, Milvus, PGVector - Placeholder implementations with NotImplementedError for Chroma/Qdrant/Weaviate - Integrate chunk deletion into OpenAI vector store file deletion flow - removed xfail from test_openai_vector_store_delete_file_removes_from_vector_store Closes: #2477 --------- Signed-off-by: Derek Higgins <derekh@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
9e77be1f72
|
chore: Fix chroma unit tests (#2896)
# What does this PR do? Enable Chroma inline unit tests and fix integration tests. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
1463b79218
|
feat(registry): make the Stack query providers for model listing (#2862)
This flips #2823 and #2805 by making the Stack periodically query the providers for models rather than the providers going behind the back and calling "register" on to the registry themselves. This also adds support for model listing for all other providers via `ModelRegistryHelper`. Once this is done, we do not need to manually list or register models via `run.yaml` and it will remove both noise and annoyance (setting `INFERENCE_MODEL` environment variables, for example) from the new user experience. In addition, it adds a configuration variable `allowed_models` which can be used to optionally restrict the set of models exposed from a provider. |
||
|
e33a50480d
|
fix: starter template and litellm backward compat conflict for openai (#2885)
# What does this PR do? openai/models.py has backward compat entries for litellm model names. the starter template includes these in the list of registered models. the inclusion results in duplicate model registrations. the backward compat is no longer necessary. ## Test Plan ci |
||
|
cd8715d327
|
chore: Added openai compatible vector io endpoints for chromadb (#2489)
Some checks failed
Integration Tests / discover-tests (push) Successful in 3s
Coverage Badge / unit-tests (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 3s
Python Package Build Test / build (3.13) (push) Failing after 2s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 16s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 16s
Python Package Build Test / build (3.12) (push) Failing after 12s
Test External Providers / test-external-providers (venv) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 10s
Test Llama Stack Build / build-single-provider (push) Failing after 15s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 23s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 20s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 21s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 20s
Unit Tests / unit-tests (3.13) (push) Failing after 14s
Test Llama Stack Build / build (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 18s
Unit Tests / unit-tests (3.12) (push) Failing after 14s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 18s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 49s
Integration Tests / test-matrix (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do? This PR implements the openai compatible endpoints for chromadb Closes #2462 ## Test Plan Ran ollama llama stack server and ran the command `pytest -sv --stack-config=http://localhost:8321 tests/integration/vector_io/test_openai_vector_stores.py --embedding-model all-MiniLM-L6-v2` 8 failed, 27 passed, 8 skipped, 1 xfailed The failed ones are regarding files api --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: sarthakdeshpande <sarthak.deshpande@engati.com> Co-authored-by: Francisco Javier Arceo <farceo@redhat.com> Co-authored-by: Francisco Arceo <arceofrancisco@gmail.com> |
||
|
2aba2c1236
|
chore: Moving vector store and vector store files helper methods to openai_vector_store_mixin (#2863)
# What does this PR do? Moving vector store and vector store files helper methods to `openai_vector_store_mixin.py` <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan The tests are already supported in the CI and tests the inline providers and current integration tests. Note that the `vector_index` fixture will be test `milvus_vec_adapter`, `faiss_vec_adapter`, and `sqlite_vec_adapter` in `tests/unit/providers/vector_io/test_vector_io_openai_vector_stores.py`. Additionally, the integration tests in `integration-vector-io-tests.yml` runs `tests/integration/vector_io` tests for the following providers: ```python vector-io-provider: ["inline::faiss", "inline::sqlite-vec", "inline::milvus", "remote::chromadb", "remote::pgvector"] ``` Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
e1ed152779
|
chore: create OpenAIMixin for inference providers with an OpenAI-compat API that need to implement openai_* methods (#2835)
Some checks failed
Coverage Badge / unit-tests (push) Failing after 3s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Python Package Build Test / build (3.12) (push) Failing after 3s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 6s
Integration Tests / discover-tests (push) Successful in 7s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 6s
Python Package Build Test / build (3.13) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 5s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Unit Tests / unit-tests (3.12) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 11s
Test External Providers / test-external-providers (venv) (push) Failing after 8s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 17s
Unit Tests / unit-tests (3.13) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 16s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 18s
Integration Tests / test-matrix (push) Failing after 18s
Pre-commit / pre-commit (push) Successful in 1m14s
# What does this PR do? add an `OpenAIMixin` for use by inference providers who remote endpoints support an OpenAI compatible API. use is demonstrated by refactoring - OpenAIInferenceAdapter - NVIDIAInferenceAdapter (adds embedding support) - LlamaCompatInferenceAdapter ## Test Plan existing unit and integration tests |
||
|
8e1a2b4703
|
chore: remove *_openai_compat providers (#2849)
# What does this PR do? These are no longer needed as llama-stack-evals can run against OAI endpoints directly. ## Test Plan |
||
|
199f859eec
|
feat(vllm): periodically refresh models (#2823)
Just like #2805 but for vLLM. We also make VLLM_URL env variable optional (not required) -- if not specified, the provider silently sits idle and yells eventually if someone tries to call a completion on it. This is done so as to allow this provider to be present in the `starter` distribution. ## Test Plan Set up vLLM, copy the starter template and set `{ refresh_models: true, refresh_models_interval: 10 }` for the vllm provider and then run: ``` ENABLE_VLLM=vllm VLLM_URL=http://localhost:8000/v1 \ uv run llama stack run --image-type venv /tmp/starter.yaml ``` Verify that `llama-stack-client models list` brings up the model correctly from vLLM. |
||
|
68a2dfbad7
|
feat(ollama): periodically refresh models (#2805)
For self-hosted providers like Ollama (or vLLM), the backing server is running a set of models. That server should be treated as the source of truth and the Stack registry should just be a cache for those models. Of course, in production environments, you may not want this (because you know what model you are running statically) hence there's a config boolean to control this behavior. _This is part of a series of PRs aimed at removing the requirement of needing to set `INFERENCE_MODEL` env variables for running Llama Stack server._ ## Test Plan Copy and modify the starter.yaml template / config and enable `refresh_models: true, refresh_models_interval: 10` for the ollama provider. Then, run: ``` LLAMA_STACK_LOGGING=all=debug \ ENABLE_OLLAMA=ollama uv run llama stack run --image-type venv /tmp/starter.yaml ``` See a gargantuan amount of logs, but verify that the provider is periodically refreshing models. Stop and prune a model from ollama server, restart the server. Verify that the model goes away when I call `uv run llama-stack-client models list` |
||
|
477bcd4d09
|
feat: allow dynamic model registration for nvidia inference provider (#2726)
# What does this PR do? let's users register models available at https://integrate.api.nvidia.com/v1/models that isn't already in llama_stack/providers/remote/inference/nvidia/models.py ## Test Plan 1. run the nvidia distro 2. register a model from https://integrate.api.nvidia.com/v1/models that isn't already know, as of this writing nvidia/llama-3.1-nemotron-ultra-253b-v1 is a good example 3. perform inference w/ the model |
||
|
b57db11bed
|
feat: create dynamic model registration for OpenAI and Llama compat remote inference providers (#2745)
Some checks failed
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 5s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 4s
Python Package Build Test / build (3.13) (push) Failing after 2s
Test Llama Stack Build / generate-matrix (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Update ReadTheDocs / update-readthedocs (push) Failing after 3s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 13s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 13s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 17s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 15s
Integration Tests / test-matrix (push) Failing after 5s
Unit Tests / unit-tests (3.12) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 19s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 19s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 22s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 17s
Test External Providers / test-external-providers (venv) (push) Failing after 17s
Test Llama Stack Build / build (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 35s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 57s
Unit Tests / unit-tests (3.13) (push) Failing after 53s
Pre-commit / pre-commit (push) Successful in 1m42s
# What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> The purpose of this task is to create a solution that can automatically detect when new models are added, deprecated, or removed by OpenAI and Llama API providers, and automatically update the list of supported models in LLamaStack. This feature is vitally important in order to avoid missing new models and editing the entries manually hence I created automation allowing users to dynamically register: - any models from OpenAI provider available at [https://api.openai.com/v1/models](https://api.openai.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/openai/models.py) - any models from Llama API provider available at [https://api.llama.com/v1/models](https://api.llama.com/v1/models) that are not in [https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py](https://github.com/meta-llama/llama-stack/blob/main/llama_stack/providers/remote/inference/llama_openai_compat/models.py) <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2504 this PR is dependant on #2710 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> 1. Create venv at root llamastack directory: `uv venv .venv --python 3.12 --seed` 2. Activate venv: `source .venv/bin/activate` 3. `uv pip install -e .` 4. Create OpenAI distro modifying run.yaml 5. Build distro: `llama stack build --template starter --image-type venv` 6. Then run LlamaStack, but before navigate to templates/starter folder: `llama stack run run.yaml --image-type venv OPENAI_API_KEY=<YOUR_KEY> ENABLE_OPENAI=openai` 7. Then try to register dummy llm that doesn't exist in OpenAI provider: ` llama-stack-client models register ianm/ianllm --provider-model-id=ianllm --provider-id=openai ` You should receive this output - combined list of static config + fetched available models from OpenAI: <img width="1380" height="474" alt="Screenshot 2025-07-14 at 12 48 50" src="https://github.com/user-attachments/assets/d26aad18-6b15-49ee-9c49-b01b2d33f883" /> 8. Then register real llm from OpenAI: llama-stack-client models register openai/gpt-4-turbo-preview --provider-model-id=gpt-4-turbo-preview --provider-id=openai <img width="1253" height="613" alt="Screenshot 2025-07-14 at 13 43 02" src="https://github.com/user-attachments/assets/60a5c9b1-3468-4eb9-9e92-cd7d21de3ca0" /> <img width="1288" height="655" alt="Screenshot 2025-07-14 at 13 43 11" src="https://github.com/user-attachments/assets/c1e48871-0e24-4bd9-a0b8-8c95552a51ee" /> We correctly fetched all available models from OpenAI As for Llama API, as a non-US person I don't have access to Llama API Key but I joined wait list. The implementation for Llama is the same as for OpenAI since Llama is openai compatible. So, the response from GET endpoint has the same structure as OpenAI https://llama.developer.meta.com/docs/api/models |
||
|
a3e249807b
|
chore: remove vision model URL workarounds and simplify client creation (#2775)
The vision models are now available at the standard URL, so the workaround code has been removed. This also simplifies the codebase by eliminating the need for per-model client caching. - Remove special URL handling for meta/llama-3.2-11b/90b-vision-instruct models - Convert _get_client method to _client property for cleaner API - Remove unnecessary lru_cache decorator and functools import - Simplify client creation logic to use single base URL for all models |
||
|
e1755d1ed2
|
chore: Adding OpenAI Vector Stores Files API compatibility for PGVector (#2755)
# What does this PR do? Adding OpenAI Vector Stores Files API compatibility for PGVector <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan Updated CI to include PGVector --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
31b088978a
|
fix: Fix /vector-stores/create API when vector store with duplicate name (#2617)
# What does this PR do? Resolves https://github.com/meta-llama/llama-stack/issues/2735 Currently, if you test against OpenAI's Vector Stores API the `client.vector_stores.search` call fails with an invalid vector_db during routing (see the script referenced in the clickable item under the Test Plan section). This PR ensures that `client.vector_stores.search()` is compatible with OpenAI's Vector Stores API. Two biggest changes: 1. The `name`, which was previously used as the `vector_db_id`, has been changed to be consistent with OpenAI's `vs_{uuid}` format. 2. The vector store ID has to be referenced by the ID, the name is not reliable as every `client.vector_stores.create` results in a new vector store. NOTE: I believe this is a breaking change for end users as they'll need to update their VectorDB identifiers. ## Test Plan Unit tests: ```bash ./scripts/unit-tests.sh tests/unit/providers/vector_io/ -v ``` Integration tests: ```bash ENABLE_MILVUS=milvus llama stack run /Users/farceo/dev/llama-stack/llama_stack/templates/starter/run.yaml --image-type venv LLAMA_STACK_CONFIG=http://localhost:8321 pytest -sv tests/integration/vector_io/test_openai_vector_stores.py --embedding-model=all-MiniLM-L6-v2 -vv ``` Unit tests and test script below 👇 <details> <summary>Click here for script used to test OpenAI and Llama Stack Vector Store implementation</summary> ```python import json import argparse from openai import OpenAI, pagination import logging from colorama import Fore, Style, init import traceback import os # Initialize colorama for color support in terminal init(autoreset=True) # Setup basic logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') DEMO_VECTOR_STORE_NAME = "Support FAQ FJA" global DEMO_VECTOR_STORE_ID global DEMO_VECTOR_STORE_ID2 def colored_print(color, text): """Prints text to the console with the specified color.""" print(f"{color}{text}{Style.RESET_ALL}") def log_and_print(color, message, level=logging.INFO): """Logs a message and prints it to the console with the specified color.""" logging.log(level, message) colored_print(color, message) def run_tests(client, prefix="openai"): """ Runs all tests using the provided OpenAI client and saves the output to JSON files with the given prefix. """ # Create the directory if it doesn't exist os.makedirs('openai_testing', exist_ok=True) # Default values in case tests fail global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = None DEMO_VECTOR_STORE_ID2 = None def test_idempotent_vector_store_creation(): """ Test that creating a vector store with the same name is idempotent. """ log_and_print(Fore.BLUE, "Starting vector store creation test...") try: vector_store = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Attempt to create the same vector store again vector_store2 = client.vector_stores.create( name=DEMO_VECTOR_STORE_NAME, ) # Check instead of assert if vector_store2.id != vector_store.id: log_and_print(Fore.YELLOW, f"FAILED IDEMPOTENCY: the same VectorStore name for {prefix.upper()} does not return the same ID", level=logging.WARNING) else: log_and_print(Fore.GREEN, f"PASSED IDEMPOTENCY: f{vector_store2.id} == {vector_store.id} the same VectorStore name for {prefix.upper()} returns the same ID") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.create = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_create.json', 'w') as f: json.dump(vector_store_data, f, indent=2) global DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 DEMO_VECTOR_STORE_ID = vector_store.id DEMO_VECTOR_STORE_ID2 = vector_store2.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 except Exception as e: log_and_print(Fore.RED, f"Idempotent vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Create a fallback vector store ID if needed if 'vector_store' in locals() and vector_store: DEMO_VECTOR_STORE_ID = vector_store.id return DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 def test_vector_store_list(): """ Test listing vector stores. """ log_and_print(Fore.BLUE, "Starting vector store list test...") try: vector_stores = client.vector_stores.list() # Check instead of assert if not isinstance(vector_stores, pagination.SyncCursorPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of vector stores, got {type(vector_stores)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Vector store list test passed!") vector_stores_data = vector_stores.to_dict() log_and_print(Fore.WHITE, f"vector_stores.list = {json.dumps(vector_stores_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_list.json', 'w') as f: json.dump(vector_stores_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Vector store list test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_retrieve_vector_store(): """ Test retrieving a specific vector store. """ log_and_print(Fore.BLUE, "Starting retrieve vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping retrieve vector store test - no vector store ID available", level=logging.WARNING) return try: vector_store = client.vector_stores.retrieve( vector_store_id=DEMO_VECTOR_STORE_ID, ) # Check instead of assert if vector_store.id != DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "FAILED: Retrieved vector store ID does not match", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Retrieve vector store test passed!") vector_store_data = vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.retrieve = {json.dumps(vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_retrieve.json', 'w') as f: json.dump(vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Retrieve vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_modify_vector_store(): """ Test modifying a vector store. """ log_and_print(Fore.BLUE, "Starting modify vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping modify vector store test - no vector store ID available", level=logging.WARNING) return try: updated_vector_store = client.vector_stores.update( vector_store_id=DEMO_VECTOR_STORE_ID, name="Updated Support FAQ FJA", ) # Check instead of assert if updated_vector_store.name != "Updated Support FAQ FJA": log_and_print(Fore.YELLOW, "FAILED: Vector store name was not updated correctly", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Modify vector store test passed!") updated_vector_store_data = updated_vector_store.to_dict() log_and_print(Fore.WHITE, f"vector_stores.modify = {json.dumps(updated_vector_store_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_modify.json', 'w') as f: json.dump(updated_vector_store_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Modify vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_delete_vector_store(): """ Test deleting a vector store. """ log_and_print(Fore.BLUE, "Starting delete vector store test...") if not DEMO_VECTOR_STORE_ID2: log_and_print(Fore.YELLOW, "Skipping delete vector store test - no second vector store ID available", level=logging.WARNING) return try: response = client.vector_stores.delete( vector_store_id=DEMO_VECTOR_STORE_ID2, ) log_and_print(Fore.GREEN, "Delete vector store test passed!") response_data = response.to_dict() log_and_print(Fore.WHITE, f"Vector store delete response = {json.dumps(response_data, indent=2)}") with open(f'openai_testing/{prefix}_vector_store_delete.json', 'w') as f: json.dump(response_data, f, indent=2) except Exception as e: log_and_print(Fore.RED, f"Delete vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_create_vector_store_file(): log_and_print(Fore.BLUE, "Starting create vector store file test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping create vector store file test - no vector store ID available", level=logging.WARNING) return try: # create jsonl of files as an example with open("mydata.jsonl", "w") as f: f.write('{"text": "What is the return policy?", "metadata": {"category": "support"}}\n') f.write('{"text": "How do I reset my password?", "metadata": {"category": "support"}}\n') f.write('{"text": "Where can I find my order history?", "metadata": {"category": "support"}}\n') f.write('{"text": "What are the shipping options?", "metadata": {"category": "support"}}\n') f.write('{"text": "What is your favorite banana?", "metadata": {"category": "support"}}\n') # Create a simple text file if my_data_small.txt doesn't exist if not os.path.exists("my_data_small.txt"): with open("my_data_small.txt", "w") as f: f.write("This is a test file for vector store testing.\n") created_file = client.files.create( file=open("my_data_small.txt", "rb"), purpose="assistants", ) created_file_data = created_file.to_dict() log_and_print(Fore.WHITE, f"Created file {json.dumps(created_file_data, indent=2)}") with open(f'openai_testing/{prefix}_file_create.json', 'w') as f: json.dump(created_file_data, f, indent=2) retrieved_files = client.files.retrieve(created_file.id) retrieved_files_data = retrieved_files.to_dict() log_and_print(Fore.WHITE, f"Retrieved file {json.dumps(retrieved_files_data, indent=2)}") with open(f'openai_testing/{prefix}_file_retrieve.json', 'w') as f: json.dump(retrieved_files_data, f, indent=2) vector_store_file = client.vector_stores.files.create( vector_store_id=DEMO_VECTOR_STORE_ID, file_id=created_file.id, ) log_and_print(Fore.GREEN, "Create vector store file test passed!") except Exception as e: log_and_print(Fore.RED, f"Create vector store file test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) def test_search_vector_store(): """ Test searching a vector store. """ log_and_print(Fore.BLUE, "Starting search vector store test...") if not DEMO_VECTOR_STORE_ID: log_and_print(Fore.YELLOW, "Skipping search vector store test - no vector store ID available", level=logging.WARNING) return try: query = "What is the banana policy?" search_results = client.vector_stores.search( vector_store_id=DEMO_VECTOR_STORE_ID, query=query, max_num_results=10, ranking_options={ 'ranker': 'default-2024-11-15', 'score_threshold': 0.0, }, rewrite_query=False, ) # Check instead of assert if not isinstance(search_results, pagination.SyncPage): log_and_print(Fore.YELLOW, f"FAILED: Expected a list of search results, got {type(search_results)}", level=logging.WARNING) else: log_and_print(Fore.GREEN, "Search vector store test passed!") search_results_dict = search_results.to_dict() log_and_print(Fore.WHITE, f"Search results = {search_results_dict}") with open(f'openai_testing/{prefix}_vector_store_search.json', 'w') as f: json.dump(search_results_dict, f, indent=2) log_and_print(Fore.WHITE, f"vector_stores.search = {search_results.to_json()}") except Exception as e: log_and_print(Fore.RED, f"Search vector store test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) # Run all tests in sequence, even if some fail test_results = [] try: result = test_idempotent_vector_store_creation() if result and len(result) == 2: DEMO_VECTOR_STORE_ID, DEMO_VECTOR_STORE_ID2 = result test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"Vector store creation test failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) for test_func in [ test_vector_store_list, test_retrieve_vector_store, test_modify_vector_store, test_delete_vector_store, test_create_vector_store_file, test_search_vector_store ]: try: test_func() test_results.append(True) except Exception as e: log_and_print(Fore.RED, f"{test_func.__name__} failed: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) test_results.append(False) if all(test_results): log_and_print(Fore.GREEN, f"All {prefix} tests completed successfully!") else: failed_count = test_results.count(False) log_and_print(Fore.YELLOW, f"{failed_count} {prefix} test(s) failed, but script completed.") if __name__ == "__main__": parser = argparse.ArgumentParser(description="Run OpenAI and/or LlamaStack tests.") parser.add_argument( "--provider", type=str, default="llama", choices=["openai", "llama", "both"], help="Specify which environment to test: openai, llama, or both. Default is both.", ) args = parser.parse_args() try: if args.provider in ("openai", "both"): openai_client = OpenAI() run_tests(openai_client, prefix="openai") if args.provider in ("llama", "both"): llama_client = OpenAI(base_url="http://localhost:8321/v1/openai/v1", api_key="none") run_tests(llama_client, prefix="llama") log_and_print(Fore.GREEN, "All tests completed!") except Exception as e: log_and_print(Fore.RED, f"Tests failed to complete: {e}", level=logging.ERROR) logging.error(traceback.format_exc()) ``` </details> --------- Signed-off-by: Francisco Javier Arceo <farceo@redhat.com> |
||
|
4ae5656c2f
|
feat: Implement keyword search in milvus (#2231)
Some checks failed
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 7s
Integration Tests / discover-tests (push) Successful in 8s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 6s
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 10s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 8s
Test Llama Stack Build / generate-matrix (push) Successful in 8s
Python Package Build Test / build (3.13) (push) Failing after 6s
Unit Tests / unit-tests (3.12) (push) Failing after 6s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 13s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 12s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 15s
Test External Providers / test-external-providers (venv) (push) Failing after 9s
Test Llama Stack Build / build-single-provider (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 14s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 19s
Integration Tests / test-matrix (push) Failing after 8s
Test Llama Stack Build / build (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 51s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 55s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 57s
Update ReadTheDocs / update-readthedocs (push) Failing after 50s
Pre-commit / pre-commit (push) Successful in 2m9s
# What does this PR do? This PR adds the keyword search implementation for Milvus. Along with the implementation for remote Milvus, the tests require us to start a Milvus containers locally. In order to verify the implementation, run: ``` pytest tests/unit/providers/vector_io/remote/test_milvus.py -v -s --tb=short --disable-warnings --asyncio-mode=auto ``` You can also test the changes using the below script: ``` #!/usr/bin/env python3 import asyncio import os import uuid from typing import List from llama_stack_client import ( Agent, AgentEventLogger, LlamaStackClient, RAGDocument ) class MilvusRAGDemo: def __init__(self, base_url: str = "http://localhost:8321/"): self.client = LlamaStackClient(base_url=base_url) self.vector_db_id = f"milvus_rag_demo_{uuid.uuid4().hex[:8]}" self.model_id = None self.embedding_model_id = None self.embedding_dimension = None def setup_models(self): """Get available models and select appropriate ones for LLM and embeddings.""" models = self.client.models.list() # Select embedding model embedding_models = [m for m in models if m.model_type == "embedding"] if not embedding_models: raise ValueError("No embedding models found") self.embedding_model_id = embedding_models[0].identifier self.embedding_dimension = embedding_models[0].metadata["embedding_dimension"] def register_vector_db(self): print(f"Registering Milvus vector database: {self.vector_db_id}") response = self.client.vector_dbs.register( vector_db_id=self.vector_db_id, embedding_model=self.embedding_model_id, embedding_dimension=self.embedding_dimension, provider_id="milvus-remote", # Use remote Milvus ) print(f"Vector database registered successfully") return response def insert_documents(self): """Insert sample documents into the vector database.""" print("\nInserting sample documents...") # Sample documents about different topics documents = [ RAGDocument( document_id="ai_ml_basics", content=""" Artificial Intelligence (AI) and Machine Learning (ML) are transforming the world. AI refers to the simulation of human intelligence in machines, while ML is a subset of AI that enables computers to learn and improve from experience without being explicitly programmed. Deep learning, a subset of ML, uses neural networks with multiple layers to process complex patterns in data. Key concepts in AI/ML include: - Supervised Learning: Training with labeled data - Unsupervised Learning: Finding patterns in unlabeled data - Reinforcement Learning: Learning through trial and error - Neural Networks: Computing systems inspired by biological brains """, mime_type="text/plain", metadata={"topic": "technology", "category": "ai_ml"}, ), ] # Insert documents with chunking self.client.tool_runtime.rag_tool.insert( documents=documents, vector_db_id=self.vector_db_id, chunk_size_in_tokens=200, # Smaller chunks for better granularity ) print(f"Inserted {len(documents)} documents with chunking") def test_keyword_search(self): """Test keyword-based search using BM25.""" queries = [ "neural networks", "Python frameworks", "data cleaning", ] for query in queries: response = self.client.vector_io.query( vector_db_id=self.vector_db_id, query=query, params={ "mode": "keyword", # Keyword search "max_chunks": 3, "score_threshold": 0.0, } ) for i, (chunk, score) in enumerate(zip(response.chunks, response.scores)): print(f" {i+1}. Score: {score:.4f}") print(f" Content: {chunk.content[:100]}...") print(f" Metadata: {chunk.metadata}") def run_demo(self): try: self.setup_models() self.register_vector_db() self.insert_documents() self.test_keyword_search() except Exception as e: print(f"Error during demo: {e}") raise def main(): """Main function to run the demo.""" # Check if Llama Stack server is running demo = MilvusRAGDemo() try: demo.run_demo() except Exception as e: print(f"Demo failed: {e}") if __name__ == "__main__": main() ``` [//]: # (## Documentation) --------- Signed-off-by: Varsha Prasad Narsing <varshaprasad96@gmail.com> |
||
|
33f0d83ad3
|
chore: Move vector store kvstore implementation into openai_vector_store_mixin.py (#2748)
|
||
|
6b8a8c1be9
|
fix: Safety in starter (#2731)
- fireworks, together do not support Llama-guard 3 8b model anymore - Need to default to ollama - current safety shields logic was not correct since the shield_id was the provider ( which had duplicates ) - Followed similar logic to models Note: Seems a bit over-engineered but this can now be extended to other providers and fits in the overall mechanism of how env_vars are used to manage starter. ### How to test ``` ENABLE_OLLAMA=ollama ENABLE_FIREWORKS=fireworks SAFETY_MODEL=llama-guard3:1b pytest -s -v tests/integration/ --stack-config starter -k 'not(supervised_fine_tune or builtin_tool_code or safety_with_image or code_interpreter_for or rag_and_code or truncation or register_and_unregister)' --text-model fireworks/meta-llama/Llama-3.3-70B-Instruct --vision-model fireworks/meta-llama/Llama-4-Scout-17B-16E-Instruct --safety-shield llama-guard3:1b --embedding-model all-MiniLM-L6-v2 ``` ### Related but not obvious in this PR In the llama-stack-ops repo, we run tests before publishing packages and docker containers. The actions in that repo were using the fireworks / together distros ( which are non-existent ) So need to update that to run with `starter` and use `ollama` specifically for safety. |
||
|
51d9fd4808
|
fix: Don't cache clients for passthrough auth providers (#2728)
Some checks failed
Test Llama Stack Build / build-ubi9-container-distribution (push) Failing after 43s
Unit Tests / unit-tests (3.12) (push) Failing after 45s
Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 2s
Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 4s
Integration Tests / discover-tests (push) Successful in 6s
Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 7s
Pre-commit / pre-commit (push) Successful in 2m8s
Test Llama Stack Build / build-custom-container-distribution (push) Failing after 4s
Test Llama Stack Build / generate-matrix (push) Successful in 5s
Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 9s
Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 11s
SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 12s
Test Llama Stack Build / build-single-provider (push) Failing after 7s
Python Package Build Test / build (3.13) (push) Failing after 5s
Python Package Build Test / build (3.12) (push) Failing after 7s
Unit Tests / unit-tests (3.13) (push) Failing after 6s
Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 13s
Test External Providers / test-external-providers (venv) (push) Failing after 7s
Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 11s
Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 12s
Update ReadTheDocs / update-readthedocs (push) Failing after 6s
Integration Tests / test-matrix (push) Failing after 6s
Test Llama Stack Build / build (push) Failing after 4s
Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 12s
SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 16s
# What does this PR do? Some of our inference providers support passthrough authentication via `x-llamastack-provider-data` header values. This fixes the providers that support passthrough auth to not cache their clients to the backend providers (mostly OpenAI client instances) so that the client connecting to Llama Stack has to provide those auth values on each and every request. ## Test Plan I added some unit tests to ensure we're not caching clients across requests for all the fixed providers in this PR. ``` uv run pytest -sv tests/unit/providers/inference/test_inference_client_caching.py ``` I also ran some of our OpenAI compatible API integration tests for each of the changed providers, just to ensure they still work. Note that these providers don't actually pass all these tests (for unrelated reasons due to quirks of the Groq and Together SaaS services), but enough of the tests passed to confirm the clients are still working as intended. ### Together ``` ENABLE_TOGETHER="together" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "together/meta-llama/Llama-3.1-8B-Instruct" ``` ### OpenAI ``` ENABLE_OPENAI="openai" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "openai/gpt-4o-mini" ``` ### Groq ``` ENABLE_GROQ="groq" \ uv run llama stack run llama_stack/templates/starter/run.yaml LLAMA_STACK_CONFIG=http://localhost:8321 \ uv run pytest -sv \ tests/integration/inference/test_openai_completion.py \ --text-model "groq/meta-llama/Llama-3.1-8B-Instruct" ``` --------- Signed-off-by: Ben Browning <bbrownin@redhat.com> |
||
|
aa2595c7c3
|
fix: sambanova shields and model validation (#2693)
# What does this PR do? Update the shield register validation of Sambanova not to raise, but only warn when a model is not available in the base url endpoint used, also added warnings when model is not available in the base url endpoint used <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.* --> run starter distro with Sambanova enabled |