# What does this PR do?
- Fixed type hinting and missing imports across multiple modules.
- Improved compatibility by using `TYPE_CHECKING` for conditional
imports.
- Updated `pyproject.toml` to enforce stricter linting.
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
- Enable mypy to run in the CI on a subset of the repository
- Fix a few mypy errors
- Run mypy from pre-commit
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Added necessary dependencies to ensure successful execution of unit
tests. Without these, the following command would fail due to missing
imports:
```
uv run pytest -v -k "ollama" \
--inference-model=llama3.2:3b-instruct-fp16
llama_stack/providers/tests/inference/test_model_registration.py
```
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run:
```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py
```
You can observe that some tests pass while others fail, but the test
runs successfully.
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Clarify when to update documentation, explain `uv sync --extra dev` and
OpenAPI generation, and specify where generated docs are stored.
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.
This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279
## Test Plan
Ensure all `llama` CLI `model` sub-commands work:
```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```
Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```
Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs
Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.
```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
# What does this PR do?
- Remove hardcoded configurations from pre-commit.
- Allow configuration to be set via pyproject.toml.
- Merge .ruff.toml settings into pyproject.toml.
- Ensure the linter and formatter use the defined configuration instead
of being overridden by pre-commit.
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This commit enhances the signal handling mechanism in the server by
improving the `handle_signal` (previously handle_sigint) function. It
now properly retrieves the signal name, ensuring clearer logging when a
termination signal is received. Additionally, it cancels all running
tasks and waits for their completion before stopping the event loop,
allowing for a more graceful shutdown. Support for handling
SIGTERM has also been added alongside SIGINT.
Before the changes, handle_sigint used asyncio.run(run_shutdown()).
However, asyncio.run() is meant to start a new event loop, and calling
it inside an existing one (like when running Uvicorn) raises an error.
The fix replaces asyncio.run(run_shutdown()) with an async function
scheduled on the existing loop using loop.create_task(shutdown()). This
ensures that the shutdown coroutine runs within the current event loop
instead of trying to create a new one.
Furthermore, this commit updates the project dependencies. `fastapi` and
`uvicorn` have been added to the development dependencies in
`pyproject.toml` and `uv.lock`, ensuring that the necessary packages are
available for development and execution.
Closes: https://github.com/meta-llama/llama-stack/issues/1043
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run a server and send SIGINT:
```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
db_path: /Users/leseb/.llama/distributions/ollama/registry.db
namespace: null
type: sqlite
models:
- metadata: {}
model_id: meta-llama/Llama-3.2-3B-Instruct
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- llm
provider_id: ollama
provider_model_id: null
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- embedding
provider_id: sentence-transformers
provider_model_id: null
providers:
agents:
- config:
persistence_store:
db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
namespace: null
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
datasetio:
- config: {}
provider_id: huggingface
provider_type: remote::huggingface
- config: {}
provider_id: localfs
provider_type: inline::localfs
eval:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- config:
url: http://localhost:11434
provider_id: ollama
provider_type: remote::ollama
- config: {}
provider_id: sentence-transformers
provider_type: inline::sentence-transformers
safety:
- config: {}
provider_id: llama-guard
provider_type: inline::llama-guard
scoring:
- config: {}
provider_id: basic
provider_type: inline::basic
- config: {}
provider_id: llm-as-judge
provider_type: inline::llm-as-judge
- config:
openai_api_key: '********'
provider_id: braintrust
provider_type: inline::braintrust
telemetry:
- config:
service_name: llama-stack
sinks: console,sqlite
sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
provider_id: meta-reference
provider_type: inline::meta-reference
tool_runtime:
- config:
api_key: '********'
max_results: 3
provider_id: brave-search
provider_type: remote::brave-search
- config:
api_key: '********'
max_results: 3
provider_id: tavily-search
provider_type: remote::tavily-search
- config: {}
provider_id: code-interpreter
provider_type: inline::code-interpreter
- config: {}
provider_id: rag-runtime
provider_type: inline::rag-runtime
vector_io:
- config:
kvstore:
db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
namespace: null
type: sqlite
provider_id: faiss
provider_type: inline::faiss
scoring_fns: []
server:
port: 8321
tls_certfile: null
tls_keyfile: null
shields: []
tool_groups:
- args: null
mcp_endpoint: null
provider_id: tavily-search
toolgroup_id: builtin::websearch
- args: null
mcp_endpoint: null
provider_id: rag-runtime
toolgroup_id: builtin::rag
- args: null
mcp_endpoint: null
provider_id: code-interpreter
toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => ollama
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => sentence-transformers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: models => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inference => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-vector_io => faiss
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-safety => llama-guard
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: shields => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: safety => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_io => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => code-interpreter
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => rag-runtime
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_groups => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_runtime => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: agents => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => huggingface
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => localfs
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasets => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasetio => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: telemetry => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => basic
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => llm-as-judge
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => braintrust
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring_functions => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-eval => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval_tasks => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inspect => __builtin__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216:
INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`...
INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss.
INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss.
INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes.
Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available.
INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2...
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106:
Serving API eval
POST /v1/eval/tasks/{task_id}/evaluations
DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
GET /v1/eval/tasks/{task_id}/jobs/{job_id}
POST /v1/eval/tasks/{task_id}/jobs
Serving API agents
POST /v1/agents
POST /v1/agents/{agent_id}/session
POST /v1/agents/{agent_id}/session/{session_id}/turn
DELETE /v1/agents/{agent_id}
DELETE /v1/agents/{agent_id}/session/{session_id}
GET /v1/agents/{agent_id}/session/{session_id}
GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API scoring_functions
GET /v1/scoring-functions/{scoring_fn_id}
GET /v1/scoring-functions
POST /v1/scoring-functions
Serving API safety
POST /v1/safety/run-shield
Serving API inspect
GET /v1/health
GET /v1/inspect/providers
GET /v1/inspect/routes
GET /v1/version
Serving API tool_runtime
POST /v1/tool-runtime/invoke
GET /v1/tool-runtime/list-tools
POST /v1/tool-runtime/rag-tool/insert
POST /v1/tool-runtime/rag-tool/query
Serving API datasetio
POST /v1/datasetio/rows
GET /v1/datasetio/rows
Serving API shields
GET /v1/shields/{identifier}
GET /v1/shields
POST /v1/shields
Serving API eval_tasks
GET /v1/eval-tasks/{eval_task_id}
GET /v1/eval-tasks
POST /v1/eval-tasks
Serving API models
GET /v1/models/{model_id}
GET /v1/models
POST /v1/models
DELETE /v1/models/{model_id}
Serving API datasets
GET /v1/datasets/{dataset_id}
GET /v1/datasets
POST /v1/datasets
DELETE /v1/datasets/{dataset_id}
Serving API vector_io
POST /v1/vector-io/insert
POST /v1/vector-io/query
Serving API inference
POST /v1/inference/chat-completion
POST /v1/inference/completion
POST /v1/inference/embeddings
Serving API tool_groups
GET /v1/tools/{tool_name}
GET /v1/toolgroups/{toolgroup_id}
GET /v1/toolgroups
GET /v1/tools
POST /v1/toolgroups
DELETE /v1/toolgroups/{toolgroup_id}
Serving API vector_dbs
GET /v1/vector-dbs/{vector_db_id}
GET /v1/vector-dbs
POST /v1/vector-dbs
DELETE /v1/vector-dbs/{vector_db_id}
Serving API scoring
POST /v1/scoring/score
POST /v1/scoring/score-batch
Serving API telemetry
GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
GET /v1/telemetry/spans/{span_id}/tree
GET /v1/telemetry/traces/{trace_id}
POST /v1/telemetry/events
GET /v1/telemetry/spans
GET /v1/telemetry/traces
POST /v1/telemetry/spans/export
Listening on ['::', '0.0.0.0']:5001
INFO: Started server process [65372]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
^CINFO: Shutting down
INFO: Finished server process [65372]
Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl
```
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
Switch to uv for dependency management and update CONTRIBUTING.md with
new setup instructions. Add missing dev dependencies to pyproject.toml
and apply minor formatting fixes.
Signed-off-by: Sébastien Han <seb@redhat.com>
- [ ] Addresses issue (#issue)
## Test Plan
Please describe:
- tests you ran to verify your changes with result summaries.
- provide instructions so it can be reproduced.
## Sources
Please link relevant resources if necessary.
## Before submitting
- [x] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Signed-off-by: Sébastien Han <seb@redhat.com>
## What does this PR do?
See issue: #747 -- `uv` is just plain better. This PR does the bare
minimum of replacing `pip install` by `uv pip install` and ensuring `uv`
exists in the environment.
## Test Plan
First: create new conda, `uv pip install -e .` on `llama-stack` -- all
is good.
Next: run `llama stack build --template together` followed by `llama
stack run together` -- all good
Next: run `llama stack build --template together --image-name yoyo`
followed by `llama stack run together --image-name yoyo` -- all good
Next: fresh conda and `uv pip install -e .` and `llama stack build
--template together --image-type venv` -- all good.
Docker: `llama stack build --template together --image-type container`
works!