llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-08-16 06:27:58 +00:00

Author	SHA1	Message	Date
slekkala1	25e0553eed	chore: Change moderations api response to Provider returned categories (#3098 ) # What does this PR do? To be compliant with model policies for LLAMA, just return the categories as is from provider, we will lose the OAI compat in moderations api response. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan `SAFETY_MODEL=llama-guard3:8b LLAMA_STACK_CONFIG=starter uv run pytest -v tests/integration/safety/test_safety.py --text-model=llama3.2:3b-instruct-fp16 --embedding-model=all-MiniLM-L6-v2 --safety-shield=ollama`	2025-08-13 09:47:35 -07:00
Mustafa Elbehery	4a13ef45e9	fix: Implement missing `run_moderation` method in `PromptGuardSafetyImpl` (#3101 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR addresses an issue where `PromptGuardSafetyImpl` was an incomplete implementation of an abstract class. The class was missing the required run_moderation method from its parent interface. Currently, running `pre-commit` locally fails with the error below. ``` llama_stack/providers/inline/safety/prompt_guard/__init__.py:15: error: Cannot instantiate abstract class "PromptGuardSafetyImpl" with abstract attribute "run_moderation" [abstract] Found 1 error in 1 file (checked 410 source files) ``` This PR fixes the issue as follows - Added the missing run_moderation method to PromptGuardSafetyImpl - Method raises NotImplementedError with appropriate message indicating this functionality is not implemented for PromptGuard - This allows the class to be properly instantiated while clearly indicating the limitation <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-12 08:32:52 -07:00
Mustafa Elbehery	b5b5f5b9ae	chore: add `mypy` prompt guard (#2678 ) # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> This PR adds static type coverage to `llama-stack` Part of https://github.com/meta-llama/llama-stack/issues/2647 <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> Signed-off-by: Mustafa Elbehery <melbeher@redhat.com>	2025-08-11 08:40:40 -07:00
IAN MILLER	e12524af85	feat: create unregister shield API endpoint in Llama Stack (#2853 ) Some checks failed Integration Auth Tests / test-matrix (oauth2_token) (push) Failing after 10s Details Integration Tests (Replay) / discover-tests (push) Successful in 13s Details Python Package Build Test / build (3.12) (push) Failing after 1s Details Test External Providers Installed via Module / test-external-providers-from-module (venv) (push) Has been skipped Details Vector IO Integration Tests / test-matrix (3.12, inline::milvus) (push) Failing after 24s Details Test External API and Providers / test-external (venv) (push) Failing after 12s Details Unit Tests / unit-tests (3.13) (push) Failing after 10s Details Update ReadTheDocs / update-readthedocs (push) Failing after 9s Details Python Package Build Test / build (3.13) (push) Failing after 15s Details Vector IO Integration Tests / test-matrix (3.12, remote::chromadb) (push) Failing after 27s Details SqlStore Integration Tests / test-postgres (3.13) (push) Failing after 29s Details Vector IO Integration Tests / test-matrix (3.12, remote::pgvector) (push) Failing after 27s Details Vector IO Integration Tests / test-matrix (3.13, inline::milvus) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.12, remote::weaviate) (push) Failing after 22s Details Vector IO Integration Tests / test-matrix (3.13, inline::sqlite-vec) (push) Failing after 25s Details Vector IO Integration Tests / test-matrix (3.13, remote::pgvector) (push) Failing after 21s Details Unit Tests / unit-tests (3.12) (push) Failing after 19s Details Vector IO Integration Tests / test-matrix (3.12, inline::sqlite-vec) (push) Failing after 35s Details SqlStore Integration Tests / test-postgres (3.12) (push) Failing after 39s Details Vector IO Integration Tests / test-matrix (3.13, remote::weaviate) (push) Failing after 23s Details Vector IO Integration Tests / test-matrix (3.13, inline::faiss) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.13, remote::qdrant) (push) Failing after 35s Details Vector IO Integration Tests / test-matrix (3.12, remote::qdrant) (push) Failing after 1m2s Details Vector IO Integration Tests / test-matrix (3.12, inline::faiss) (push) Failing after 1m4s Details Vector IO Integration Tests / test-matrix (3.13, remote::chromadb) (push) Failing after 1m2s Details Integration Tests (Replay) / Integration Tests (, , , client=, vision=) (push) Failing after 7s Details Pre-commit / pre-commit (push) Successful in 2m21s Details # What does this PR do? <!-- Provide a short summary of what this PR does and why. Link to relevant issues if applicable. --> Extend the Shields Protocol and implement the capability to unregister previously registered shields and CLI for shields management. <!-- If resolving an issue, uncomment and update the line below --> <!-- Closes #[issue-number] --> Closes #2581 ## Test Plan <!-- Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed. --> First of, test API for shields 1. Install and start Ollama: `ollama serve` 2. Pull Llama Guard Model in Ollama: `ollama pull llama-guard3:8b` 3. Configure env variables: ``` export ENABLE_OLLAMA=ollama export OLLAMA_URL=http://localhost:11434 ``` 4. Build Llama Stack distro: `llama stack build --template starter --image-type venv ` 5. Start Llama Stack server: `llama stack run starter --port 8321` 6. Check if Ollama model is available: `curl -X GET http://localhost:8321/v1/models \| jq '.data[] \| select(.provider_id=="ollama")'` 7. Register a new Shield using Ollama provider: ``` curl -X POST http://localhost:8321/v1/shields \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "provider_id": "llama-guard", "provider_shield_id": "ollama/llama-guard3:8b", "params": {} }' ``` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 8. Check if shield was registered: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"identifier":"test-shield","provider_resource_id":"ollama/llama-guard3:8b","provider_id":"llama-guard","type":"shield","owner":{"principal":"","attributes":{}},"params":{}}% ` 9. Run shield: ``` curl -X POST http://localhost:8321/v1/safety/run-shield \ -H "Content-Type: application/json" \ -d '{ "shield_id": "test-shield", "messages": [ { "role": "user", "content": "How can I hack into someone computer?" } ], "params": {} }' ``` `{"violation":{"violation_level":"error","user_message":"I can't answer that. Can I help with something else?","metadata":{"violation_type":"S2"}}}% ` 10. Unregister shield: `curl -X DELETE http://localhost:8321/v1/shields/test-shield` `null% ` 11. Verify shield was deleted: `curl -X GET http://localhost:8321/v1/shields/test-shield` `{"detail":"Invalid value: Shield 'test-shield' not found"}%` All tests passed ✅ ``` ========================================================================== 430 passed, 194 warnings in 19.54s ========================================================================== /Users/iamiller/GitHub/llama-stack/.venv/lib/python3.12/site-packages/litellm/llms/custom_httpx/async_client_cleanup.py:78: RuntimeWarning: coroutine 'close_litellm_async_clients' was never awaited loop.close() RuntimeWarning: Enable tracemalloc to get the object allocation traceback Wrote HTML report to htmlcov-3.12/index.html ```	2025-08-05 07:33:46 -07:00
Ashwin Bharambe	2665f00102	chore(rename): move llama_stack.distribution to llama_stack.core (#2975 ) We would like to rename the term `template` to `distribution`. To prepare for that, this is a precursor. cc @leseb	2025-07-30 23:30:53 -07:00
Michael Dawson	a654467552	feat: add cpu/cuda config for prompt guard (#2194 ) # What does this PR do? Previously prompt guard was hard coded to require cuda which prevented it from being used on an instance without a cuda support. This PR allows prompt guard to be configured to use either cpu or cuda. [//]: # (If resolving an issue, uncomment and update the line below) Closes [#2133](https://github.com/meta-llama/llama-stack/issues/2133) ## Test Plan (Edited after incorporating suggestion) 1) started stack configured with prompt guard as follows on a system without a GPU and validated prompt guard could be used through the APIs 2) validated on a system with a gpu (but without llama stack) that the python selecting between cpu and cuda support returned the right value when a cuda device was available. 3) ran the unit tests as per - https://github.com/meta-llama/llama-stack/blob/main/tests/unit/README.md [//]: # (## Documentation) --------- Signed-off-by: Michael Dawson <mdawson@devrus.com>	2025-05-28 12:23:15 -07:00
Ihar Hrachyshka	9e6561a1ec	chore: enable pyupgrade fixes (#1806 ) # What does this PR do? The goal of this PR is code base modernization. Schema reflection code needed a minor adjustment to handle UnionTypes and collections.abc.AsyncIterator. (Both are preferred for latest Python releases.) Note to reviewers: almost all changes here are automatically generated by pyupgrade. Some additional unused imports were cleaned up. The only change worth of note can be found under `docs/openapi_generator` and `llama_stack/strong_typing/schema.py` where reflection code was updated to deal with "newer" types. Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-05-01 14:23:50 -07:00
Ashwin Bharambe	d072b5fa0c	test: add unit test to ensure all config types are instantiable (#1601 )	2025-03-12 22:29:58 -07:00
Ihar Hrachyshka	c3d7d17bc4	chore: fix typing hints for get_provider_impl deps arguments (#1544 ) # What does this PR do? It's a dict that may contain different types, as per resolver:instantiate_provider implementation. (AFAIU it also never contains ProviderSpecs, but instances of provider implementations.) [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan mypy passing if enabled checks for these modules. (See #1543) [//]: # (## Documentation) Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>	2025-03-11 10:07:28 -07:00
Sébastien Han	e4a1579e63	build: format codebase imports using ruff linter (#1028 ) # What does this PR do? - Configured ruff linter to automatically fix import sorting issues. - Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are applied. - Enabled the 'I' selection to focus on import-related linting rules. - Ran the linter, and formatted all codebase imports accordingly. - Removed the black dep from the "dev" group since we use ruff Signed-off-by: Sébastien Han <seb@redhat.com> [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. Provide clear instructions so the plan can be easily re-executed.] [//]: # (## Documentation) [//]: # (- [ ] Added a Changelog entry if the change is significant) Signed-off-by: Sébastien Han <seb@redhat.com>	2025-02-13 10:06:21 -08:00
Yuan Tang	34ab7a3b6c	Fix precommit check after moving to ruff (#927 ) Lint check in main branch is failing. This fixes the lint check after we moved to ruff in https://github.com/meta-llama/llama-stack/pull/921. We need to move to a `ruff.toml` file as well as fixing and ignoring some additional checks. Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>	2025-02-02 06:46:45 -08:00
Xi Yan	3c72c034e6	[remove import ] clean up import 's (#689 ) # What does this PR do? - as title, cleaning up `import `'s - upgrade tests to make them more robust to bad model outputs - remove import 's in llama_stack/apis/* (skip __init__ modules) <img width="465" alt="image" src="https://github.com/user-attachments/assets/d8339c13-3b40-4ba5-9c53-0d2329726ee2" /> - run `sh run_openapi_generator.sh`, no types gets affected ## Test Plan ### Providers Tests agents ``` pytest -v -s llama_stack/providers/tests/agents/test_agents.py -m "together" --safety-shield meta-llama/Llama-Guard-3-8B --inference-model meta-llama/Llama-3.1-405B-Instruct-FP8 ``` inference ```bash # meta-reference torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py torchrun $CONDA_PREFIX/bin/pytest -v -s -k "meta_reference" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py # together pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.1-8B-Instruct" ./llama_stack/providers/tests/inference/test_text_inference.py pytest -v -s -k "together" --inference-model="meta-llama/Llama-3.2-11B-Vision-Instruct" ./llama_stack/providers/tests/inference/test_vision_inference.py pytest ./llama_stack/providers/tests/inference/test_prompt_adapter.py ``` safety ``` pytest -v -s llama_stack/providers/tests/safety/test_safety.py -m together --safety-shield meta-llama/Llama-Guard-3-8B ``` memory ``` pytest -v -s llama_stack/providers/tests/memory/test_memory.py -m "sentence_transformers" --env EMBEDDING_DIMENSION=384 ``` scoring ``` pytest -v -s -m llm_as_judge_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct pytest -v -s -m basic_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py pytest -v -s -m braintrust_scoring_together_inference llama_stack/providers/tests/scoring/test_scoring.py ``` datasetio ``` pytest -v -s -m localfs llama_stack/providers/tests/datasetio/test_datasetio.py pytest -v -s -m huggingface llama_stack/providers/tests/datasetio/test_datasetio.py ``` eval ``` pytest -v -s -m meta_reference_eval_together_inference llama_stack/providers/tests/eval/test_eval.py pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio llama_stack/providers/tests/eval/test_eval.py ``` ### Client-SDK Tests ``` LLAMA_STACK_BASE_URL=http://localhost:5000 pytest -v ./tests/client-sdk ``` ### llama-stack-apps ``` PORT=5000 LOCALHOST=localhost python -m examples.agents.hello $LOCALHOST $PORT python -m examples.agents.inflation $LOCALHOST $PORT python -m examples.agents.podcast_transcript $LOCALHOST $PORT python -m examples.agents.rag_as_attachments $LOCALHOST $PORT python -m examples.agents.rag_with_memory_bank $LOCALHOST $PORT python -m examples.safety.llama_guard_demo_mm $LOCALHOST $PORT python -m examples.agents.e2e_loop_with_custom_tools $LOCALHOST $PORT # Vision model python -m examples.interior_design_assistant.app python -m examples.agent_store.app $LOCALHOST $PORT ``` ### CLI ``` which llama llama model prompt-format -m Llama3.2-11B-Vision-Instruct llama model list llama stack list-apis llama stack list-providers inference llama stack build --template ollama --image-type conda ``` ### Distributions Tests ollama ``` llama stack build --template ollama --image-type conda ollama run llama3.2:1b-instruct-fp16 llama stack run ./llama_stack/templates/ollama/run.yaml --env INFERENCE_MODEL=meta-llama/Llama-3.2-1B-Instruct ``` fireworks ``` llama stack build --template fireworks --image-type conda llama stack run ./llama_stack/templates/fireworks/run.yaml ``` together ``` llama stack build --template together --image-type conda llama stack run ./llama_stack/templates/together/run.yaml ``` tgi ``` llama stack run ./llama_stack/templates/tgi/run.yaml --env TGI_URL=http://0.0.0.0:5009 --env INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct ``` ## Sources Please link relevant resources if necessary. ## Before submitting - [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case). - [ ] Ran pre-commit to handle lint / formatting issues. - [ ] Read the [contributor guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md), Pull Request section? - [ ] Updated relevant documentation. - [ ] Wrote necessary unit or integration tests.	2024-12-27 15:45:44 -08:00
Ashwin Bharambe	8de8eb03c8	Update the "InterleavedTextMedia" type (#635 ) ## What does this PR do? This is a long-pending change and particularly important to get done now. Specifically: - we cannot "localize" (aka download) any URLs from media attachments anywhere near our modeling code. it must be done within llama-stack. - `PIL.Image` is infesting all our APIs via `ImageMedia -> InterleavedTextMedia` and that cannot be right at all. Anything in the API surface must be "naturally serializable". We need a standard `{ type: "image", image_url: "<...>" }` which is more extensible - `UserMessage`, `SystemMessage`, etc. are moved completely to llama-stack from the llama-models repository. See https://github.com/meta-llama/llama-models/pull/244 for the corresponding PR in llama-models. ## Test Plan ```bash cd llama_stack/providers/tests pytest -s -v -k "fireworks or ollama or together" inference/test_vision_inference.py pytest -s -v -k "(fireworks or ollama or together) and llama_3b" inference/test_text_inference.py pytest -s -v -k chroma memory/test_memory.py \ --env EMBEDDING_DIMENSION=384 --env CHROMA_DB_PATH=/tmp/foobar pytest -s -v -k fireworks agents/test_agents.py \ --safety-shield=meta-llama/Llama-Guard-3-8B \ --inference-model=meta-llama/Llama-3.1-8B-Instruct ``` Updated the client sdk (see PR ...), installed the SDK in the same environment and then ran the SDK tests: ```bash cd tests/client-sdk LLAMA_STACK_CONFIG=together pytest -s -v agents/test_agents.py LLAMA_STACK_CONFIG=ollama pytest -s -v memory/test_memory.py # this one needed a bit of hacking in the run.yaml to ensure I could register the vision model correctly INFERENCE_MODEL=llama3.2-vision:latest LLAMA_STACK_CONFIG=ollama pytest -s -v inference/test_inference.py ```	2024-12-17 11:18:31 -08:00
Dinesh Yeduguru	6395dadc2b	use logging instead of prints (#499 ) # What does this PR do? This PR moves all print statements to use logging. Things changed: - Had to add `await start_trace("sse_generator")` to server.py to actually get tracing working. else was not seeing any logs - If no telemetry provider is provided in the run.yaml, we will write to stdout - by default, the logs are going to be in JSON, but we expose an option to configure to output in a human readable way.	2024-11-21 11:32:53 -08:00
Ashwin Bharambe	983d6ce2df	Remove the "ShieldType" concept (#430 ) # What does this PR do? This PR kills the notion of "ShieldType". The impetus for this is the realization: > Why is keyword llama-guard appearing so many times everywhere, sometimes with hyphens, sometimes with underscores? Now that we have a notion of "provider specific resource identifiers" and "user specific aliases" for those and the fact that this works with models ("Llama3.1-8B-Instruct" <> "fireworks/llama-3pv1-..."), we can follow the same rules for Shields. So each Safety provider can make up a notion of identifiers it has registered. This already happens with Bedrock correctly. We just generalize it for Llama Guard, Prompt Guard, etc. For Llama Guard, we further simplify by just adopting the underlying model name itself as the identifier! No confusion necessary. While doing this, I noticed a bug in our DistributionRegistry where we weren't scoping identifiers by type. Fixed. ## Feature/Issue validation/testing/test plan Ran (inference, safety, memory, agents) tests with ollama and fireworks providers.	2024-11-12 12:37:24 -08:00
Ashwin Bharambe	c1f7ba3aed	Split safety into (llama-guard, prompt-guard, code-scanner) (#400 ) Splits the meta-reference safety implementation into three distinct providers: - inline::llama-guard - inline::prompt-guard - inline::code-scanner Note that this PR is a backward incompatible change to the llama stack server. I have added deprecation_error field to ProviderSpec -- the server reads it and immediately barfs. This is used to direct the user with a specific message on what action to perform. An automagical "config upgrade" is a bit too much work to implement right now :/ (Note that we will be gradually prefixing all inline providers with inline:: -- I am only doing this for this set of new providers because otherwise existing configuration files will break even more badly.)	2024-11-11 09:29:18 -08:00

16 commits