# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
`start_venv.sh` lifecycle should be:
025f615868
>>
34e3faa4e8
>>
4684fd3f8d
Finally replaced by `start_stack.sh`
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
We want to bundle a bunch of (typically remote) providers in a distro
template and be able to configure them "on the fly" via environment
variables. So far, we have been able to do this with simple env var
replacements. However, sometimes you want to only conditionally enable
providers (because the relevant remote services may not be alive, or
relevant.) This was not possible until now.
To aid this, we add a simple (bash-like) env var replacement
enhancement: `${env.FOO+bar}` evaluates to `bar` if the variable is SET
and evaluates to empty string if it is not. On top of that, we update
our main resolver to ignore any provider whose ID is null.
This allows using the distro like this:
```bash
llama stack run dev --env CHROMADB_URL=http://localhost:6001 --env ENABLE_CHROMADB=1
```
when only Chroma is UP. This disables the other `pgvector` provider in
the run configuration.
## Test Plan
Hard code `chromadb` as the vector io provider inside
`test_vector_io.py` and run:
```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/vector_io/ --embedding-model all-MiniLM-L6-v2
```
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# Summary:
This led to extremely hard to debug messages.
Before:
llama_stack/distribution/library_client.py:275: in request
response = await self._call_non_streaming(
llama_stack/distribution/library_client.py:322: in _call_non_streaming
result = await matched_func(**body)
llama_stack/providers/utils/telemetry/trace_protocol.py:102: in
async_wrapper
result = await method(self, *args, **kwargs)
llama_stack/providers/inline/agents/meta_reference/agents.py:80: in
create_agent
value=agent_config.model_dump_json(),
E AttributeError: 'dict' object has no attribute 'model_dump_json'
After:
E ValueError: Failed to convert parameter {'model':
'meta-llama/Llama-3.1-8B-Instruct', 'instructions': 'You are a helpful
assistant', 'sampling_params': {'strategy': {'type': 'top_p',
'temperature': 0.0001, 'top_p': 0.9}}, 'toolgroups': [{'name':
'builtin::rag'}], 'input_shields': ['meta-llama/Llama-Guard-3-8B'],
'output_shields': ['meta-llama/Llama-Guard-3-8B'],
'enable_session_persistence': False} into <class
'llama_stack.apis.agents.agents.AgentConfig'>: 2 validation errors for
AgentConfig
E toolgroups.0.str
E Input should be a valid string [type=string_type, input_value={'name':
'builtin::rag'}, input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/string_type
E toolgroups.0.AgentToolGroupWithArgs.args
E Field required [type=missing, input_value={'name': 'builtin::rag'},
input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/missing
# Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
check conda env name using basepath in exec.py
The current logic for finding conda prefix does a `endswith` check with
just the conda env name, but this will cause us to match incorrect if
there is a different conda env which ends with same suffix. In my case,
i had stack and llama-stack as the two conda envs.
## Test Plan
llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
# What does this PR do?
Tool format depends on the model. @ehhuang introduced a
`get_default_tool_prompt_format` function for this purpose. We should
use that instead of hacky model ID matching we had before.
Secondly, non llama models don't have this concept so testing with those
models should work as is.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```bash
for distro in fireworks ollama; do
LLAMA_STACK_CONFIG=$distro \
pytest -s -v tests/client-sdk/inference/test_text_inference.py \
--inference-model=meta-llama/Llama-3.2-3B-Instruct \
--vision-inference-model=""
done
LLAMA_STACK_CONFIG=dev \
pytest -s -v tests/client-sdk/inference/test_text_inference.py \
--inference-model=openai/gpt-4o \
--vision-inference-model=""
```
[//]: # (## Documentation)
Summary:
Lets the model decide which tool it needs to call to respond to a query.
Test Plan:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/ --safety-shield meta-llama/Llama-Guard-3-8B
```
Also evaluated on a small benchmark with 20 questions from HotpotQA.
With this PR and some prompting, the performance is 77% recall compared
to 50% currently.
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1015).
* #1268
* #1239
* __->__ #1015
# What does this PR do?
- Introduced logging in `StackRun` to replace print-based messages
- Improved error handling for config file loading and parsing
- Replaced `cprint` with `logger.error` for consistent error messaging
- Ensured logging is used in `server.py` for startup, shutdown, and
runtime messages
- Added missing exception handling for invalid providers
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
currently, build_venv.sh expects a `distribution_type` as the first
argument but the only things ever passed are:
1. image name
2. pip dependencies
so distribution_type is never passed in meaning the script errors when
calling something like:
`llama stack build --image-type venv --template ollama --image-name
test`
before output:
```
llama stack build --image-type venv --template ollama --image-name venv-test
Usage: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> <env_name> <pip_dependencies> [<special_pip_deps>]
Example: /Users/charliedoern/projects/Documents/llama-stack/llama_stack/distribution/build_venv.sh <distribution_type> mybuild ./my-stack-build.yaml 'numpy pandas scipy'
Failed to build target venv-test with return code 1
Run config path is empty
```
after:
```
llama stack build --image-type venv --template ollama --image-name venv-test
Environment 'venv-test' already exists, re-using it.
Using virtual environment venv-test
Using CPython 3.13.0 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: venv-test
Activate with: source venv-test/bin/activate
Using Python 3.13.0 environment at: venv-test
Resolved 55 packages in 640ms
Built fire==0.7.0
Prepared 54 packages in 1.14s
Installed 55 packages in 82ms
+ annotated-types==0.7.0
```
## Test Plan
ran locally with output above
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
now that llama stack supports running in venv, conda, and container
modes and the 3 scripts overlap alot, combine these three into ons
`start_stack.sh` script
## Test Plan
tested this locally on venv, conda, and container
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
Summary:
Currently we don't set the best tool_prompt_format according to model as
promisd.
Test Plan:
Added print around raw model input and inspected manually
---
[//]: # (BEGIN SAPLING FOOTER)
Stack created with [Sapling](https://sapling-scm.com). Best reviewed
with
[ReviewStack](https://reviewstack.dev/meta-llama/llama-stack/pull/1214).
* #1234
* __->__ #1214
# What does this PR do?
When building providers in a virtual environment or containers, special
pip dependencies may not always be provided (e.g., for Ollama). The
check should only fail if the required number of arguments is missing.
Currently, two arguments are mandatory:
1. Environment name
2. Pip dependencies
Additionally, return statements were replaced with sys.exit(1) in error
conditions to ensure immediate termination on critical failures. Error
handling in the stack build process was also improved to guarantee the
program exits with status 1 when facing configuration issues or build
failures.
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
This command shouldn't fail:
```
llama stack build --template ollama --image-type venv
```
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
--run runs the stack that was just build using the same arguments during
the build process (image-name, type, etc)
This simplifies the workflow a lot and makes the UX better for most
local users trying to get started rather than having to match the flags
of the two commands (build and then run)
Also, moved `ImageType` to distribution.utils since there were circular
import errors with its old location
## Test Plan
tested locally using the following command:
`llama stack build --run --template ollama --image-type venv`
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
Addresses issues where the container is unable to run as root. Gives
write access to required folders.
[//]: # (If resolving an issue, uncomment and update the line below)
(Closes #[1207])
## Test Plan
I built locally and ran `llama stack build --template remote-vllm
--image-type container` and validated I could see my changes in the
output:
```
#11 1.186 Installed 11 packages in 61ms
#11 1.186 + llama-models==0.1.3
#11 1.186 + llama-stack==0.1.3
#11 1.186 + llama-stack-client==0.1.3
#11 1.186 + markdown-it-py==3.0.0
#11 1.186 + mdurl==0.1.2
#11 1.186 + prompt-toolkit==3.0.50
#11 1.186 + pyaml==25.1.0
#11 1.186 + pygments==2.19.1
#11 1.186 + rich==13.9.4
#11 1.186 + tiktoken==0.9.0
#11 1.186 + wcwidth==0.2.13
#11 DONE 1.6s
#12 [ 9/10] RUN mkdir -p /.llama /.cache
#12 DONE 0.3s
#13 [10/10] RUN chmod -R g+rw /app /.llama /.cache
#13 DONE 0.3s
#14 exporting to image
#14 exporting layers
#14 exporting layers 3.7s done
#14 writing image sha256:11cc8bd954db6d036037bcaf471b173ddd5261ac4b1e72074cccf85d18aefb96 done
#14 naming to docker.io/library/distribution-remote-vllm:0.1.3 done
#14 DONE 3.7s
+ set +x
Success!
```
This is what the resulting image looks like:

Also tagged the image as `0.1.3-test` and [pushed to
quay](https://quay.io/repository/jland/distribution-remote-vllm?tab=tags)
(note there are a bunch of critical vulnerabilities we may want to look
into)
And for good measure I deployed the resulting image on my Openshift
environment using the default Security Context and validated that there
were no issue with it coming up.
My validation was all done with the `vllm-remote` distribution, but if I
am understanding everything correctly the other distributions are just
different run.yaml configs.
[//]: # (## Documentation)
Please let me know if there is anything else I need to do.
Co-authored-by: Jamie Land <hokie10@gmail.com>
See Issue #922
The change is slightly backwards incompatible but no callsite (in our
client codebases or stack-apps) every passes a depth-2
`List[List[InterleavedContentItem]]` (which is now disallowed.)
## Test Plan
```bash
$ cd llama_stack/providers/tests/inference
$ pytest -s -v -k fireworks test_embeddings.py \
--inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k together test_embeddings.py \
--inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k ollama test_embeddings.py \
--inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784
```
Also ran `tests/client-sdk/inference/test_embeddings.py`
# What does this PR do?
- Fully deprecate eval/tasks
[//]: # (If resolving an issue, uncomment and update the line below)
Closes#1088
NOTE: this will be a breaking change. We have introduced the new API in
0.1.3 .
Notebook has been updated to use the new endpoints.
## Test Plan
```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
<img width="611" alt="image"
src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f"
/>
cc @SLR722 for awareness
[//]: # (## Documentation)
# What does this PR do?
I was encountering build issues when building my `ollama` environment
using `llama stack build`
```bash
llama stack build --template ollama --image-type venv
Traceback (most recent call last):
File "/Users/farceo/dev/llama-stack/.venv/bin/llama", line 10, in <module>
sys.exit(main())
^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 46, in main
parser.run(args)
File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 40, in run
args.func(args)
File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/build.py", line 77, in _run_stack_build_command
return run_stack_build_command(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 180, in run_stack_build_command
_run_stack_build_command_from_build_config(
File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 272, in _run_stack_build_command_from_build_config
return_code = build_image(
^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/distribution/build.py", line 137, in build_image
return_code = run_with_pty(args)
^^^^^^^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 22, in run_with_pty
return _run_with_pty_unix(command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 53, in _run_with_pty_unix
process = subprocess.Popen(
^^^^^^^^^^^^^^^^^
File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1950, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/farceo/dev/llama-stack/llama_stack/distribution/build_venv.sh'
make: *** [build-ollama] Error 1
```
I also had to adjust the script when testing the `common.sh` file
because it returned:
```bash
> source llama_stack/distribution/common.sh
llama_stack/distribution/common.sh:6: command not found: ^M
llama_stack/distribution/common.sh:50: parse error near `\n'
```
On my branch, I ran:
```bash
sed -i '' 's/\r$//' llama_stack/distribution/common.sh
```
And then I was able to successfully build the environment.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
N/A
[//]: # (## Documentation)
N/A
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
- Closes#1142
- Root cause is due to having `Union[str, AgenToolGroupWithArgs]`
## Test Plan
- Test with script described in issue.
- Print out final converted pydantic object
<img width="1470" alt="image"
src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6"
/>
[//]: # (## Documentation)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.
This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279
## Test Plan
Ensure all `llama` CLI `model` sub-commands work:
```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```
Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```
Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs
Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.
```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
# What does this PR do?
- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- This change is backward compatible
- Run notebook test with
```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
- Configured ruff linter to automatically fix import sorting issues.
- Set --exit-non-zero-on-fix to ensure non-zero exit code when fixes are
applied.
- Enabled the 'I' selection to focus on import-related linting rules.
- Ran the linter, and formatted all codebase imports accordingly.
- Removed the black dep from the "dev" group since we use ruff
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
This commit enhances the signal handling mechanism in the server by
improving the `handle_signal` (previously handle_sigint) function. It
now properly retrieves the signal name, ensuring clearer logging when a
termination signal is received. Additionally, it cancels all running
tasks and waits for their completion before stopping the event loop,
allowing for a more graceful shutdown. Support for handling
SIGTERM has also been added alongside SIGINT.
Before the changes, handle_sigint used asyncio.run(run_shutdown()).
However, asyncio.run() is meant to start a new event loop, and calling
it inside an existing one (like when running Uvicorn) raises an error.
The fix replaces asyncio.run(run_shutdown()) with an async function
scheduled on the existing loop using loop.create_task(shutdown()). This
ensures that the shutdown coroutine runs within the current event loop
instead of trying to create a new one.
Furthermore, this commit updates the project dependencies. `fastapi` and
`uvicorn` have been added to the development dependencies in
`pyproject.toml` and `uv.lock`, ensuring that the necessary packages are
available for development and execution.
Closes: https://github.com/meta-llama/llama-stack/issues/1043
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run a server and send SIGINT:
```
INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
db_path: /Users/leseb/.llama/distributions/ollama/registry.db
namespace: null
type: sqlite
models:
- metadata: {}
model_id: meta-llama/Llama-3.2-3B-Instruct
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- llm
provider_id: ollama
provider_model_id: null
- metadata:
embedding_dimension: 384
model_id: all-MiniLM-L6-v2
model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
- embedding
provider_id: sentence-transformers
provider_model_id: null
providers:
agents:
- config:
persistence_store:
db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
namespace: null
type: sqlite
provider_id: meta-reference
provider_type: inline::meta-reference
datasetio:
- config: {}
provider_id: huggingface
provider_type: remote::huggingface
- config: {}
provider_id: localfs
provider_type: inline::localfs
eval:
- config: {}
provider_id: meta-reference
provider_type: inline::meta-reference
inference:
- config:
url: http://localhost:11434
provider_id: ollama
provider_type: remote::ollama
- config: {}
provider_id: sentence-transformers
provider_type: inline::sentence-transformers
safety:
- config: {}
provider_id: llama-guard
provider_type: inline::llama-guard
scoring:
- config: {}
provider_id: basic
provider_type: inline::basic
- config: {}
provider_id: llm-as-judge
provider_type: inline::llm-as-judge
- config:
openai_api_key: '********'
provider_id: braintrust
provider_type: inline::braintrust
telemetry:
- config:
service_name: llama-stack
sinks: console,sqlite
sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
provider_id: meta-reference
provider_type: inline::meta-reference
tool_runtime:
- config:
api_key: '********'
max_results: 3
provider_id: brave-search
provider_type: remote::brave-search
- config:
api_key: '********'
max_results: 3
provider_id: tavily-search
provider_type: remote::tavily-search
- config: {}
provider_id: code-interpreter
provider_type: inline::code-interpreter
- config: {}
provider_id: rag-runtime
provider_type: inline::rag-runtime
vector_io:
- config:
kvstore:
db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
namespace: null
type: sqlite
provider_id: faiss
provider_type: inline::faiss
scoring_fns: []
server:
port: 8321
tls_certfile: null
tls_keyfile: null
shields: []
tool_groups:
- args: null
mcp_endpoint: null
provider_id: tavily-search
toolgroup_id: builtin::websearch
- args: null
mcp_endpoint: null
provider_id: rag-runtime
toolgroup_id: builtin::rag
- args: null
mcp_endpoint: null
provider_id: code-interpreter
toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:213: Resolved 31 providers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => ollama
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-inference => sentence-transformers
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: models => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inference => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-vector_io => faiss
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-safety => llama-guard
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: shields => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: safety => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_dbs => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: vector_io => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => brave-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => tavily-search
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => code-interpreter
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-tool_runtime => rag-runtime
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_groups => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: tool_runtime => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: agents => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => huggingface
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-datasetio => localfs
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasets => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: datasetio => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: telemetry => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => basic
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => llm-as-judge
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-scoring => braintrust
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring_functions => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: scoring => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inner-eval => meta-reference
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval_tasks => __routing_table__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: eval => __autorouted__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:215: inspect => __builtin__
INFO 2025-02-12 10:21:03,540 llama_stack.distribution.resolver:216:
INFO 2025-02-12 10:21:03,723 llama_stack.providers.remote.inference.ollama.ollama:148: checking connectivity to Ollama at `http://localhost:11434`...
INFO 2025-02-12 10:21:03,734 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:03,843 faiss.loader:148: Loading faiss.
INFO 2025-02-12 10:21:03,865 faiss.loader:150: Successfully loaded faiss.
INFO 2025-02-12 10:21:03,868 faiss:173: Failed to load GPU Faiss: name 'GpuIndexIVFFlat' is not defined. Will not load constructor refs for GPU indexes.
Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
INFO 2025-02-12 10:21:04,315 datasets:54: PyTorch version 2.6.0 available.
INFO 2025-02-12 10:21:04,556 httpx:1740: HTTP Request: GET http://localhost:11434/api/ps "HTTP/1.1 200 OK"
INFO 2025-02-12 10:21:04,557 llama_stack.providers.utils.inference.embedding_mixin:42: Loading sentence transformer for all-MiniLM-L6-v2...
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:210: Use pytorch device_name: mps
INFO 2025-02-12 10:21:07,202 sentence_transformers.SentenceTransformer:218: Load pretrained SentenceTransformer: all-MiniLM-L6-v2
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: all-MiniLM-L6-v2 served by sentence-transformers
INFO 2025-02-12 10:21:09,500 llama_stack.distribution.stack:102: Models: meta-llama/Llama-3.2-3B-Instruct served by ollama
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::equality served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::regex_parser_multiple_choice_answer served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: basic::subset_of served by basic
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-correctness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::answer-similarity served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-entity-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-precision served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-recall served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::context-relevancy served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::factuality served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: braintrust::faithfulness served by braintrust
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::405b-simpleqa served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Scoring_fns: llm-as-judge::base served by llm-as-judge
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::code_interpreter served by code-interpreter
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::rag served by rag-runtime
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:102: Tool_groups: builtin::websearch served by tavily-search
INFO 2025-02-12 10:21:09,501 llama_stack.distribution.stack:106:
Serving API eval
POST /v1/eval/tasks/{task_id}/evaluations
DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
GET /v1/eval/tasks/{task_id}/jobs/{job_id}
POST /v1/eval/tasks/{task_id}/jobs
Serving API agents
POST /v1/agents
POST /v1/agents/{agent_id}/session
POST /v1/agents/{agent_id}/session/{session_id}/turn
DELETE /v1/agents/{agent_id}
DELETE /v1/agents/{agent_id}/session/{session_id}
GET /v1/agents/{agent_id}/session/{session_id}
GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API scoring_functions
GET /v1/scoring-functions/{scoring_fn_id}
GET /v1/scoring-functions
POST /v1/scoring-functions
Serving API safety
POST /v1/safety/run-shield
Serving API inspect
GET /v1/health
GET /v1/inspect/providers
GET /v1/inspect/routes
GET /v1/version
Serving API tool_runtime
POST /v1/tool-runtime/invoke
GET /v1/tool-runtime/list-tools
POST /v1/tool-runtime/rag-tool/insert
POST /v1/tool-runtime/rag-tool/query
Serving API datasetio
POST /v1/datasetio/rows
GET /v1/datasetio/rows
Serving API shields
GET /v1/shields/{identifier}
GET /v1/shields
POST /v1/shields
Serving API eval_tasks
GET /v1/eval-tasks/{eval_task_id}
GET /v1/eval-tasks
POST /v1/eval-tasks
Serving API models
GET /v1/models/{model_id}
GET /v1/models
POST /v1/models
DELETE /v1/models/{model_id}
Serving API datasets
GET /v1/datasets/{dataset_id}
GET /v1/datasets
POST /v1/datasets
DELETE /v1/datasets/{dataset_id}
Serving API vector_io
POST /v1/vector-io/insert
POST /v1/vector-io/query
Serving API inference
POST /v1/inference/chat-completion
POST /v1/inference/completion
POST /v1/inference/embeddings
Serving API tool_groups
GET /v1/tools/{tool_name}
GET /v1/toolgroups/{toolgroup_id}
GET /v1/toolgroups
GET /v1/tools
POST /v1/toolgroups
DELETE /v1/toolgroups/{toolgroup_id}
Serving API vector_dbs
GET /v1/vector-dbs/{vector_db_id}
GET /v1/vector-dbs
POST /v1/vector-dbs
DELETE /v1/vector-dbs/{vector_db_id}
Serving API scoring
POST /v1/scoring/score
POST /v1/scoring/score-batch
Serving API telemetry
GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
GET /v1/telemetry/spans/{span_id}/tree
GET /v1/telemetry/traces/{trace_id}
POST /v1/telemetry/events
GET /v1/telemetry/spans
GET /v1/telemetry/traces
POST /v1/telemetry/spans/export
Listening on ['::', '0.0.0.0']:5001
INFO: Started server process [65372]
INFO: Waiting for application startup.
INFO: ASGI 'lifespan' protocol appears unsupported.
INFO: Application startup complete.
INFO: Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
^CINFO: Shutting down
INFO: Finished server process [65372]
Received signal SIGINT (2). Exiting gracefully...
INFO 2025-02-12 10:21:11,215 __main__:151: Shutting down ModelsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down InferenceRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ShieldsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down SafetyRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorDBsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down VectorIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolGroupsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ToolRuntimeRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down MetaReferenceAgentsImpl
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DatasetIORouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down TelemetryAdapter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringFunctionsRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down ScoringRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalTasksRoutingTable
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down EvalRouter
INFO 2025-02-12 10:21:11,216 __main__:151: Shutting down DistributionInspectImpl
```
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
add --image-type to `llama stack run`. Which takes conda, container or
venv also add start_venv.sh which start the stack using a venv
resolves#1007
## Test Plan
running locally:
`llama stack build --template ollama --image-type venv`
`llama stack run --image-type venv
~/.llama/distributions/ollama/ollama-run.yaml`
...
```
llama stack run --image-type venv ~/.llama/distributions/ollama/ollama-run.yaml
Using run configuration: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml
+ python -m llama_stack.distribution.server.server --yaml-config /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml --port 8321
Using config file: /Users/charliedoern/.llama/distributions/ollama/ollama-run.yaml
Run configuration:
apis:
- agents
- datasetio
...
```
Signed-off-by: Charlie Doern <cdoern@redhat.com>
# What does this PR do?
currently this is the output when you run a distribution locally without
running `llama stack build`:
```
Traceback (most recent call last):
File "/Users/charliedoern/Documents/llama-sdk.py", line 25, in <module>
models = client.models.list()
^^^^^^^^^^^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 107, in list
raise exc
File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/resources/models.py", line 95, in list
return self._get(
^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack-client-python/src/llama_stack_client/_base_client.py", line 1212, in get
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 168, in request
return asyncio.run(self.async_client.request(*args, **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/Cellar/python@3.11/3.11.10/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/charliedoern/Documents/llama-stack/llama_stack/distribution/library_client.py", line 258, in request
if not self.endpoint_impls:
^^^^^^^^^^^^^^^^^^^
AttributeError: 'AsyncLlamaStackAsLibraryClient' object has no attribute 'endpoint_impls'
```
the intended exception is never raised, add an except for an
AttributeError so users can catch when they call things like
`models.list()` and so that a more useful error telling them that the
client is not properly initialized is printed.
## Test Plan
Please describe:
- I ran the script found here:
https://llama-stack.readthedocs.io/en/latest/getting_started/index.html#run-inference-with-python-sdk
locally with the changes in this PR and the exception was caught
successfully.
## Before submitting
- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
---------
Signed-off-by: Charlie Doern <cdoern@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Enables HTTPS option for Llama Stack.
While doing so, introduces a `ServerConfig` sub-structure to house all
server related configuration (port, ssl, etc.)
Also simplified the `start_container.sh` entrypoint to simply be
`python` instead of a complex bash command line.
## Test Plan
Conda:
Run:
```bash
$ llama stack build --template together
$ llama stack run --port 8322 # ensure server starts
$ llama-stack-client configure --endpoint http://localhost:8322
$ llama-stack-client models list
```
Create a self-signed SSL key / cert pair. Then, using a local checkout
of `llama-stack-client-python`, change
https://github.com/meta-llama/llama-stack-client-python/blob/main/src/llama_stack_client/_base_client.py#L759
to add `kwargs.setdefault("verify", False)` so SSL verification is
disabled. Then:
```bash
$ llama stack run --port 8322 --tls-keyfile <KEYFILE> --tls-certfile <CERTFILE>
$ llama-stack-client configure --endpoint https://localhost:8322 # notice the `https`
$ llama-stack-client models list
```
Also tested with containers (but of course one needs to make sure the
cert and key files are appropriately provided to the container.)