Commit graph

699 commits

Author SHA1 Message Date
Ashwin Bharambe
eddef0b2ae
chore: slight renaming of model alias stuff (#1181)
Quick test by running:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk
```
2025-02-20 11:48:46 -08:00
Ashwin Bharambe
2eda050aef Fix ollama fixture 2025-02-20 11:46:02 -08:00
Ashwin Bharambe
3d891fc9ba ModelAlias cleanup 2025-02-20 11:44:39 -08:00
Ashwin Bharambe
984a8039ad Kill unnecessary check on --safety-shield test param 2025-02-20 09:15:23 -08:00
Rashmi Pawar
996f27a308
fix: add logging import (#1174)
# What does this PR do?
Fixes logging import and the logger instance creation

cc: @dglogo
2025-02-20 11:26:47 -05:00
Ihar Hrachyshka
fb6a3efb1d
feat: Enable CPU training for torchtune (#1140)
# What does this PR do?

You are now able to run a training cycle on CPU. This is useful for
debugging and testing purposes.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

On a Mac machine without CUDA devices:

```
17:00:24.417 [START] /v1/post-training/supervised-fine-tune
DEBUG 2025-02-18 12:00:24,419 torchtune.utils._logging:60: Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0
INFO 2025-02-18 12:00:24,463 torchtune.utils._logging:64: Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
INFO 2025-02-18 12:00:46,699 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:182: Model is initialized with precision torch.bfloat16.
INFO 2025-02-18 12:00:46,784 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:185: Tokenizer is initialized.
INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:188: Optimizer is initialized.
INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:192: Loss is initialized.
INFO 2025-02-18 12:00:48,997 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:209: Dataset and Sampler are initialized.
INFO 2025-02-18 12:00:48,998 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:227: Learning rate scheduler is initialized.
Writing logs to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/log_1739898049.txt
1|1|Loss: 1.7414989471435547: 100% 1/1 [03:46<00:00, 226.21s/it]INFO 2025-02-18 12:04:35,227 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:528: Starting checkpoint save...
INFO 2025-02-18 12:04:49,974 torchtune.utils._logging:121: Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
INFO 2025-02-18 12:04:49,981 torchtune.utils._logging:132: Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
model_file_path /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0
1|1|Loss: 1.7414989471435547: 100% 1/1 [04:01<00:00, 241.18s/it]
INFO:     ::1:64990 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK
17:04:50.364 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (265947.01ms)
 17:00:24.419 [DEBUG] Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0
 17:00:24.463 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
 17:00:46.700 [INFO] Model is initialized with precision torch.bfloat16.
 17:00:46.784 [INFO] Tokenizer is initialized.
 17:00:46.786 [INFO] Optimizer is initialized.
 17:00:46.786 [INFO] Loss is initialized.
 17:00:48.997 [INFO] Dataset and Sampler are initialized.
 17:00:48.998 [INFO] Learning rate scheduler is initialized.
 17:04:35.227 [INFO] Starting checkpoint save...
 17:04:49.974 [INFO] Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
 17:04:49.981 [INFO] Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
```

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 22:42:58 -08:00
Xi Yan
a324ceb9a9 precommit again 2025-02-19 22:40:45 -08:00
Sébastien Han
4694780d23
test: skip model registration for unsupported providers (#1030)
# What does this PR do?
- Updated `test_register_with_llama_model` to skip tests when using the
Ollama provider, as it does not support custom model names.
- Delete `test_initialize_model_during_registering` since there is no
  "load_model" semantic that is exposed publicly on a provider.

These changes ensure that tests do not fail for providers with
incompatible behaviors.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run Ollama:

```
 uv run pytest -v -s -k "ollama" llama_stack/providers/tests/inference/test_model_registration.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================== test session starts ==========================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 65 items / 60 deselected / 5 selected                                                         

llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] SKIPPED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED

======================== 3 passed, 1 skipped, 60 deselected, 2 warnings in 0.22s ========================
```


[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-19 22:39:13 -08:00
Sixian Yi
531940aea9
script for running client sdk tests (#895)
# What does this PR do?
Create a script for running all client-sdk tests on Async Library
client, with the option to generate report


## Test Plan

```
python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report
```



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-19 22:38:06 -08:00
Xi Yan
a3d8c49459 precommit 2025-02-19 22:37:41 -08:00
Xi Yan
ce040ad111 precommit 2025-02-19 22:35:24 -08:00
Xi Yan
ca687d3e86 style: env var in build_venv 2025-02-19 22:32:59 -08:00
Shrinit Goyal
b74f25035c
Added support for mongoDB KV store (#543)
Added the support for mongoDB as KV store
validated in mongodb, it is able to store agent data, session data and
turn data
<img width="1332" alt="image"
src="https://github.com/user-attachments/assets/867700a4-b9ee-4a3c-8278-f39074d39d56">
this is how run.yaml would look:
```
    config:
      persistence_store:
        type: mongodb
        namespace: null
        host: localhost
        port: 27017
        db: llamastack
        user: ""
        password: ""
        collection_name: llamastack_kvstore
```

---------

Co-authored-by: shrinitgoyal <shrinit.goyal@engati.com>
2025-02-19 22:30:50 -08:00
Yuan Tang
5966079770
fix: More robust handling of the arguments in tool call response in remote::vllm (#1169)
# What does this PR do?

This fixes the following issue on the server side when the tool call
response contains empty args. This happens when running
`examples.agents.e2e_loop_with_client_tools` but `get_ticker_data`
returns `[]`:

```
Traceback (most recent call last):
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 169, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 189, in create_and_execute_turn
    async for chunk in self.run(
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 258, in run
    async for res in self._run(
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 499, in _run
    async for chunk in await self.inference_api.chat_completion(
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/routers/routers.py", line 182, in <genexpr>
    return (chunk async for chunk in await provider.chat_completion(**params))
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 296, in _stream_chat_completion
    async for chunk in res:
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 162, in _process_vllm_chat_completion_stream_response
    arguments=json.loads(tool_call_buf.arguments),
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```
## Test Plan

All existing tests in
`tests/client-sdk/inference/test_text_inference.py` passed.

[//]: # (## Documentation)

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-19 22:27:02 -08:00
Sébastien Han
69eebaf5bf
build: add missing dev dependencies for unit tests (#1004)
# What does this PR do?
Added necessary dependencies to ensure successful execution of unit
tests. Without these, the following command would fail due to missing
imports:

```
uv run pytest -v -k "ollama" \
     --inference-model=llama3.2:3b-instruct-fp16
     llama_stack/providers/tests/inference/test_model_registration.py
```

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py

```

You can observe that some tests pass while others fail, but the test
runs successfully.

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 22:26:11 -08:00
Xi Yan
61f43b8677
fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment (#1163)
# What does this PR do?
- resolves issue: #1159 
- Root cause: https://github.com/meta-llama/llama-stack/pull/980 forces
`build_venv.sh` to install in a venv environment, which do not work on
Colab notebook environment

<img width="1004" alt="image"
src="https://github.com/user-attachments/assets/1f9be409-5313-4926-b078-74e141cf29eb"
/>

## This PR
Use `UV_SYSTEM_PYTHON` to make sure dependencies are installed in
current system environment. Which will be used in the Colab environment.
```
UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv
```

## Test Plan
- Works in Colab environment
<img width="621" alt="image"
src="https://github.com/user-attachments/assets/ae93bc3d-e05a-44b9-bb21-fb88f29969b8"
/>
2025-02-19 22:21:16 -08:00
Francisco Arceo
2b752df79a
fix: Fixing some small issues with the build scripts (#1132)
# What does this PR do?
I was encountering build issues when building my `ollama` environment
using `llama stack build`

```bash
llama stack build --template ollama --image-type venv
Traceback (most recent call last):
  File "/Users/farceo/dev/llama-stack/.venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/build.py", line 77, in _run_stack_build_command
    return run_stack_build_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 180, in run_stack_build_command
    _run_stack_build_command_from_build_config(
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 272, in _run_stack_build_command_from_build_config
    return_code = build_image(
                  ^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/build.py", line 137, in build_image
    return_code = run_with_pty(args)
                  ^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 22, in run_with_pty
    return _run_with_pty_unix(command)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 53, in _run_with_pty_unix
    process = subprocess.Popen(
              ^^^^^^^^^^^^^^^^^
  File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/farceo/dev/llama-stack/llama_stack/distribution/build_venv.sh'
make: *** [build-ollama] Error 1
```

I also had to adjust the script when testing the `common.sh` file
because it returned:

```bash
> source llama_stack/distribution/common.sh
llama_stack/distribution/common.sh:6: command not found: ^M
llama_stack/distribution/common.sh:50: parse error near `\n'
```
On my branch, I ran:
```bash
sed -i '' 's/\r$//' llama_stack/distribution/common.sh
```
And then I was able to successfully build the environment.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
N/A

[//]: # (## Documentation)
N/A

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-19 22:20:49 -08:00
Reid
af377e844d
feat: add a option to list the downloaded models (#1127)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
$ llama model list --help
usage: llama model list [-h] [--show-all] [--downloaded]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  --downloaded          List the downloaded models

$ llama model list --downloaded
+-------------+----------+---------------------+
| Model       | Size     | Modified Time       |
+-------------+----------+---------------------+
| Llama3.2-1B | 2.31 GB  | 2025-02-16 13:38:04 |
+-------------+----------+---------------------+
| Llama3.1-8B | 14.97 GB | 2025-02-16 10:36:37 |
+-------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-19 22:17:39 -08:00
Botao Chen
2b995c22eb
feat: inference passthrough provider (#1166)
##  What does this PR do?
In this PR, we implement a passthrough inference provider that works for
any endpoints that respect llama stack inference API definition.

## Test Plan
config some endpoint that respect llama stack inference API definition
and got the inference results successfully

<img width="1268" alt="Screenshot 2025-02-19 at 8 52 51 PM"
src="https://github.com/user-attachments/assets/447816e4-ea7a-4365-b90c-386dc7dcf4a1"
/>
2025-02-19 21:47:00 -08:00
Botao Chen
b751f7003d
feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164)
as title, to let scoring function llm_as_judge_405b_simpleqa output
aggregated_results.

We can leverage categorical_count to calculate the % of correctness as
eval benchmark metrics
2025-02-19 19:42:04 -08:00
Ihar Hrachyshka
c1f7d7f005
fix: miscellaneous job management improvements in torchtune (#1136)
- **refactor: simplify job status extraction a bit**
- **torchtune: save job status on schedule**
- **refactor: get rid of job_list in torchtune job management code**

# What does this PR do?

A failed job is now registered in API, and one can consult its status.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
$ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73                                                      
JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688))
```

[//]: # (## Documentation)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 19:09:37 -08:00
Francisco Arceo
7972daa72e
feat: Chunk sqlite-vec writes (#1094)
# What does this PR do?
1. This PR adds batch inserts into sqlite-vec as requested in
https://github.com/meta-llama/llama-stack/pull/1040
- Note: the inserts uses a uuid generated from the hash of the document
id and chunk content.
2. This PR also adds unit tests for sqlite-vec. In a follow up PR, I can
add similar tests to Faiss.

## Test Plan
1. Integration tests:
```python
INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py
...
PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED
```
3. Unit tests:
```python
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py  -v -s --tb=short --disable-warnings --asyncio-mode=auto
...
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```
I also tested using the same example RAG script in
https://github.com/meta-llama/llama-stack/pull/1040 and received the
output.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-19 19:07:46 -08:00
Ashwin Bharambe
cdcbeb005b
chore: remove llama_models.llama3.api imports from providers (#1107)
There should be a choke-point for llama3.api imports -- this is the
prompt adapter. Creating a ChatFormat() object on demand is inexpensive.
The underlying Tokenizer is a singleton anyway.
2025-02-19 19:01:29 -08:00
Ben Browning
e9b8259cf9
fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123)
# What does this PR do?

Before this change, `distro_codegen.py` would only work if the user
manually installed multiple provider-specific dependencies (see #1122).
Now, users can run `distro_codegen.py` without any provider-specific
dependencies because we avoid importing the entire provider
implementations just to get the config needed to build the provider
template.

Concretely, this mostly means moving the
MODEL_ALIASES (and related variants) definitions to a new models.py
class within the provider implementation for those providers that
require additional dependencies. It also meant moving a couple of
imports from top-level imports to inside `get_adapter_impl` for some
providers, which follows the pattern used by multiple existing
providers.

To ensure we don't regress and accidentally add new imports that cause
distro_codegen.py to fail, the stubbed-in pre-commit hook for
distro_codegen.py was uncommented and slightly tweaked to run via `uv
run python ...` to ensure it runs with only the project's default
dependencies and to run automatically instead of manually.

Lastly, this updates distro_codegen.py itself to keep track of paths it
might have changed and to only `git diff` those specific paths when
checking for changed files instead of doing a diff on the entire working
tree. The latter was overly broad and would require a user have no other
unstaged changes in their working tree, even if those unstaged changes
were unrelated to generated code. Now it only flags uncommitted changes
for paths distro_codegen.py actually writes to.

Our generated code was also out-of-date, presumably because of these
issues, so this commit also has some updates to the generated code
purely because it was out of sync, and the pre-commit hook now enforces
things to be updated.

(Closes #1122)

## Test Plan

I manually tested distro_codegen.py and the pre-commit hook to verify
those work as expected, flagging any uncommited changes and catching any
imports that attempt to pull in provider-specific dependencies.

However, I do not have valid api keys to the impacted provider
implementations, and am unable to easily run the inference tests against
each changed provider. There are no functional changes to the provider
implementations here, but I'd appreciate a second set of eyes on the
changed import statements and moving of MODEL_ALIASES type code to a
separate models.py to ensure I didn't make any obvious errors.

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 18:39:20 -08:00
Ashwin Bharambe
034ece0011 Ensure that deprecations for fields follow through to OpenAPI 2025-02-19 13:54:04 -08:00
Ashwin Bharambe
31a5ba5268 Add title to the json schemas 2025-02-19 13:26:39 -08:00
ehhuang
8de7cf103b
feat: support tool_choice = {required, none, <function>} (#1059)
Summary:

titled


Test Plan:

added tests and

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-02-18 23:25:15 -05:00
Xi Yan
37cf60b732
style: remove prints in codebase (#1146)
# What does this PR do?
- replace prints in codebase with logger
- update print_table to use rich Table

## Test Plan
- library client script in
https://github.com/meta-llama/llama-stack/pull/1145

```
llama stack list-providers
```
<img width="1407" alt="image"
src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2"
/>


[//]: # (## Documentation)
2025-02-18 19:41:37 -08:00
Xi Yan
e8cb9e0adb
fix: direct client pydantic type casting (#1145)
# What does this PR do?
- Closes #1142 
- Root cause is due to having `Union[str, AgenToolGroupWithArgs]`

## Test Plan
- Test with script described in issue. 

- Print out final converted pydantic object
<img width="1470" alt="image"
src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6"
/>


[//]: # (## Documentation)
2025-02-18 16:07:54 -08:00
Reid
4e76d312fa
fix: modify the model id title for model list (#1095)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Re-check and based on the doc, the download model id, actually is model
descriptor(also without `meta-llama/`).


https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html
```
$ llama download --source huggingface --model-id  Llama-Guard-3-1B:int4 --hf-token xxx  # model descriptor
Fetching 8 files:   0%|                                                                                                                   | 0/8 [00:00<?, ?it/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s]

$ llama download --source huggingface --model-id  Llama-Guard-3-1B-INT4 --hf-token xxxx  # hugging face repo without meta-llama/
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<---


$ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found

$ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8
Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last):

$ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL]
                      [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:26:41 -08:00
Reid
d9f5beb15a
style: update download help text (#1135)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the cade:

6b1773d530/llama_stack/cli/download.py (L454)
and the test, it can use comma to specify multiple model ids. So update
the usage.
```
$ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B
Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):

Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B

[Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B
Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  6.4/6.4 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B


$ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B
original%2Fparams.json: 100%|██████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 564kB/
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
...
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 9.18MB/s]
Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B


before:
$ llama model download --help
 --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models

after:
$ llama model download --help
  --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:24:31 -08:00
Reid
92aefec191
style: update verify-download help text (#1134)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the code
6b1773d530/llama_stack/cli/download.py (L379)
and test, `verify-download` should only use in `downloaded from Meta`.

```
test: no checklist.chk  file for hf download
$ llama model download --source meta --model-id Llama3.2-1B
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00



before:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID

Verify the downloaded checkpoints' checksums

options:
  -h, --help           show this help message and exit
  --model-id MODEL_ID  Model ID to verify


after:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID

Verify the downloaded checkpoints' checksums for models downloaded from Meta

options:
  -h, --help           show this help message and exit
  --model-id MODEL_ID  Model ID to verify (only for models downloaded from Meta)
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:15:26 -08:00
Reid
89d37687dd
chore: remove --no-list-templates option (#1121)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

From the code and the usage, seems cannot see that need to use
`--no-list-templates` to handle, and also make the user confused from
the help text, so try to remove it.
```
$ llama stack build --no-list-templates
> Enter a name for your Llama Stack (e.g. my-local-stack):

$ llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack):

before:
$ llama stack build --help
  --list-templates, --no-list-templates
                        Show the available templates for building a Llama Stack distribution (default: False)

after:
  --list-templates      Show the available templates for building a Llama Stack distribution
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:13:46 -08:00
Yuan Tang
743f434860
fix: Ensure a tool call can be converted before adding to buffer (#1119)
# What does this PR do?

This fixes an issue when running the e2e agent example:
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py

```
    |   File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 175, in _process_vllm_chat_completion_stream_response
    |     tool_call = convert_tool_call(choice.delta.tool_calls[0])
    |   File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 441, in convert_tool_call
    |     return ToolCall(
    |   File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
    |     validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
    | pydantic_core._pydantic_core.ValidationError: 4 validation errors for ToolCall
    | call_id
    |   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    |     For further information visit https://errors.pydantic.dev/2.10/v/string_type
    | tool_name.enum[BuiltinTool]
    |   Input should be 'brave_search', 'wolfram_alpha', 'photogen' or 'code_interpreter' [type=enum, input_value=None, input_type=NoneType]
    |     For further information visit https://errors.pydantic.dev/2.10/v/enum
    | tool_name.str
    |   Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
    |     For further information visit https://errors.pydantic.dev/2.10/v/string_type
    | arguments
    |   Input should be a valid dictionary [type=dict_type, input_value=202, input_type=int]
    |     For further information visit https://errors.pydantic.dev/2.10/v/dict_type
```

This issue happened because not all arguments have been appended to the
tool call buffer yet. The current code assumes that we are ready to
convert the tool call whenever args can be converted to JSON
successfully. In this case, `json.loads("202")` would succeed but the
rest of the arguments have not been properly parsed yet.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

The e2e example worked successfully (although note that I ran the script
twice with each function call separately due to
https://github.com/meta-llama/llama-stack/issues/1120):
```
tool_execution> Tool:get_ticker_data Args:{'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'}
tool_execution> Tool:get_ticker_data Response:"[{\"('Year', '')\":2023,\"('Close', 'GOOG')\":140.4254455566}]"

tool_execution> Tool:web_search Args:{'query': '42nd president of the United States'}
tool_execution> Tool:web_search Response:"{\"query\": \"42nd president of the United States\", \"top_k\": [{\"title\": \"William J. Clinton | whitehouse.gov\", \"url\": \"https://obamawhitehouse.archives.gov/1600/presidents/williamjclinton\", \"description\": \"<strong>Bill Clinton</strong> is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001). He took office at the end of the Cold War, and was the first baby-boomer generation President.\", \"type\": \"search_result\"}, {\"title\": \"Bill Clinton - Wikipedia\", \"url\": \"https://en.wikipedia.org/wiki/Bill_Clinton\", \"description\": \"<strong>William Jefferson Clinton</strong> (n\\u00e9 Blythe; born August 19, 1946) is an American politician and lawyer who served as the 42nd president of the United States from 1993 to 2001. A member of the Democratic Party, he previously served as the attorney general of Arkansas from 1977 to 1979 and as the ...\", \"type\": \"search_result\"}, [{\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=eR2z_1-v87Y\", \"title\": \"A Conversation with Bill Clinton, 42nd President of the United ...\", \"description\": \"William Jefferson Clinton, the first Democratic president in six decades to be elected twice, led the United States to the longest economic expansion in Amer...\"}, {\"type\": \"video_result\", \"url\": \"4484174096/\", \"title\": \"January 20, 1993, President Clinton was sworn in as the 42nd ...\", \"description\": \"WATCH: On January 20, 1993, President Bill Clinton was sworn in as the 42nd President of the United States. #InaugurationDay Video courtesy of the...\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared thoughts ...\", \"description\": \"AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features \\u00b7 \\u00a9 2024 Google LLC\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/shorts/vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=PHihhihVth0\", \"title\": \"Bill & Hillary Clinton returning to Little Rock for 20th ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}]]}"
```

All text inference tests passed.

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-15 00:19:16 -05:00
ehhuang
ab2b46e528
feat: log start, complete time to Agent steps (#1116) 2025-02-14 17:48:06 -08:00
Reid
8dc1cac333
style: fix the capitalization issue (#1117)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
before:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
                       [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config

start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

After:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
                       [--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
                       config

Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-14 17:16:26 -08:00
Reid
3d88b81ccf
fix: remove the empty line (#1097)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Remove the empty line from help
```
before:
$ llama model download --help
  --max-parallel MAX_PARALLEL
                        Maximum number of concurrent downloads
  --ignore-patterns IGNORE_PATTERNS
                      <<<<<<<<<empty line>>>>>>>>>>
                        For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
                        safetensors files to avoid downloading duplicate weights.

after:
$ llama model download --help
  --max-parallel MAX_PARALLEL
                        Maximum number of concurrent downloads
  --ignore-patterns IGNORE_PATTERNS
                        For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
                        safetensors files to avoid downloading duplicate weights.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-14 09:33:20 -08:00
Sébastien Han
369cc513cb
fix: improve stack build on venv (#980)
# What does this PR do?

Added a pre_run_checks function to ensure a smooth environment setup by
verifying prerequisites. It checks for an existing virtual environment,
ensures uv is installed, and deactivates any active environment if
necessary.

Run the full build inside a venv created by 'uv'.

Improved string handling in printf statements and added shellcheck
suppressions for expected word splitting in pip commands.

These enhancements improve robustness, prevent
conflicts, and ensure a seamless setup process.

Signed-off-by: Sébastien Han <seb@redhat.com>

- [ ] Addresses issue (#issue)


## Test Plan

Run the following command on either Linux or MacOS:

```
llama stack build --template ollama --image-type venv --image-name foo
+ build_name=foo
+ env_name=llamastack-foo
+ pip_dependencies='datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ RED='\033[0;31m'
+ NC='\033[0m'
+ ENVNAME=
+++ readlink -f /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
++ dirname /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh
+ SCRIPT_DIR=/Users/leseb/Documents/AI/llama-stack/llama_stack/distribution
+ source /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/common.sh
+ pre_run_checks llamastack-foo
+ local env_name=llamastack-foo
+ is_command_available uv
+ command -v uv
+ '[' -d llamastack-foo ']'
+ run llamastack-foo 'datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn' 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ local env_name=llamastack-foo
+ local 'pip_dependencies=datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn'
+ local 'special_pip_deps=sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu'
+ echo 'Creating new virtual environment llamastack-foo'
Creating new virtual environment llamastack-foo
+ uv venv llamastack-foo
Using CPython 3.13.1 interpreter at: /opt/homebrew/opt/python@3.13/bin/python3.13
Creating virtual environment at: llamastack-foo
Activate with: source llamastack-foo/bin/activate
+ source llamastack-foo/bin/activate
++ '[' -n x ']'
++ SCRIPT_PATH=llamastack-foo/bin/activate
++ '[' llamastack-foo/bin/activate = /Users/leseb/Documents/AI/llama-stack/llama_stack/distribution/build_venv.sh ']'
++ deactivate nondestructive
++ unset -f pydoc
++ '[' -z '' ']'
++ '[' -z '' ']'
++ hash -r
++ '[' -z '' ']'
++ unset VIRTUAL_ENV
++ unset VIRTUAL_ENV_PROMPT
++ '[' '!' nondestructive = nondestructive ']'
++ VIRTUAL_ENV=/Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ '[' darwin24 = cygwin ']'
++ '[' darwin24 = msys ']'
++ export VIRTUAL_ENV
++ _OLD_VIRTUAL_PATH='/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ PATH='/Users/leseb/Documents/AI/llama-stack/llamastack-foo/bin:/Users/leseb/Documents/AI/llama-stack/.venv/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/opt/homebrew/bin:/opt/homebrew/sbin:/usr/local/bin:/System/Cryptexes/App/usr/bin:/usr/bin:/bin:/usr/sbin:/sbin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/local/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/bin:/var/run/com.apple.security.cryptexd/codex.system/bootstrap/usr/appleinternal/bin:/usr/local/munki:/opt/podman/bin:/opt/homebrew/opt/protobuf@21/bin:/opt/homebrew/opt/gnu-sed/libexec/gnubin:/Users/leseb/.local/share/zinit/plugins/so-fancy---diff-so-fancy:/Users/leseb/.local/share/zinit/polaris/bin:/Users/leseb/.cargo/bin:/Users/leseb/Library/Application Support/Code/User/globalStorage/github.copilot-chat/debugCommand'
++ export PATH
++ '[' x '!=' x ']'
+++ basename /Users/leseb/Documents/AI/llama-stack/llamastack-foo
++ VIRTUAL_ENV_PROMPT='(llamastack-foo) '
++ export VIRTUAL_ENV_PROMPT
++ '[' -z '' ']'
++ '[' -z '' ']'
++ _OLD_VIRTUAL_PS1=
++ PS1='(llamastack-foo) '
++ export PS1
++ alias pydoc
++ true
++ hash -r
+ '[' -n '' ']'
+ '[' -n '' ']'
+ uv pip install --no-cache-dir llama-stack
Using Python 3.13.1 environment at: llamastack-foo
Resolved 50 packages in 1.25s
   Built fire==0.7.0
Prepared 50 packages in 1.22s
Installed 50 packages in 126ms
 + annotated-types==0.7.0
 + anyio==4.8.0
 + blobfile==3.0.0
 + certifi==2025.1.31
 + charset-normalizer==3.4.1
 + click==8.1.8
 + distro==1.9.0
 + filelock==3.17.0
 + fire==0.7.0
 + fsspec==2025.2.0
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + huggingface-hub==0.28.1
 + idna==3.10
 + jinja2==3.1.5
 + llama-models==0.1.2
 + llama-stack==0.1.2
 + llama-stack-client==0.1.2
 + lxml==5.3.1
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + mdurl==0.1.2
 + numpy==2.2.2
 + packaging==24.2
 + pandas==2.2.3
 + pillow==11.1.0
 + prompt-toolkit==3.0.50
 + pyaml==25.1.0
 + pycryptodomex==3.21.0
 + pydantic==2.10.6
 + pydantic-core==2.27.2
 + pygments==2.19.1
 + python-dateutil==2.9.0.post0
 + python-dotenv==1.0.1
 + pytz==2025.1
 + pyyaml==6.0.2
 + regex==2024.11.6
 + requests==2.32.3
 + rich==13.9.4
 + setuptools==75.8.0
 + six==1.17.0
 + sniffio==1.3.1
 + termcolor==2.5.0
 + tiktoken==0.8.0
 + tqdm==4.67.1
 + typing-extensions==4.12.2
 + tzdata==2025.1
 + urllib3==2.3.0
 + wcwidth==0.2.13
+ '[' -n '' ']'
+ printf 'Installing pip dependencies\n'
Installing pip dependencies
+ uv pip install datasets matplotlib autoevals transformers blobfile opentelemetry-sdk sentencepiece opentelemetry-exporter-otlp-proto-http ollama nltk redis pillow psycopg2-binary scikit-learn pandas faiss-cpu chromadb-client numpy chardet scipy aiohttp aiosqlite requests tqdm pypdf openai aiosqlite fastapi fire httpx uvicorn
Using Python 3.13.1 environment at: llamastack-foo
Resolved 105 packages in 37ms
Uninstalled 2 packages in 65ms
Installed 72 packages in 195ms
 + aiohappyeyeballs==2.4.6
 + aiohttp==3.11.12
 + aiosignal==1.3.2
 + aiosqlite==0.21.0
 + attrs==25.1.0
 + autoevals==0.0.119
 + backoff==2.2.1
 + braintrust-core==0.0.58
 + chardet==5.2.0
 + chevron==0.14.0
 + chromadb-client==0.6.3
 + contourpy==1.3.1
 + cycler==0.12.1
 + datasets==3.2.0
 + deprecated==1.2.18
 + dill==0.3.8
 + faiss-cpu==1.10.0
 + fastapi==0.115.8
 + fonttools==4.56.0
 + frozenlist==1.5.0
 - fsspec==2025.2.0
 + fsspec==2024.9.0
 + googleapis-common-protos==1.66.0
 + grpcio==1.70.0
 + importlib-metadata==8.5.0
 + jiter==0.8.2
 + joblib==1.4.2
 + jsonschema==4.23.0
 + jsonschema-specifications==2024.10.1
 + kiwisolver==1.4.8
 + levenshtein==0.26.1
 + matplotlib==3.10.0
 + monotonic==1.6
 + multidict==6.1.0
 + multiprocess==0.70.16
 + nltk==3.9.1
 - numpy==2.2.2
 + numpy==1.26.4
 + ollama==0.4.7
 + openai==1.61.1
 + opentelemetry-api==1.30.0
 + opentelemetry-exporter-otlp-proto-common==1.30.0
 + opentelemetry-exporter-otlp-proto-grpc==1.30.0
 + opentelemetry-exporter-otlp-proto-http==1.30.0
 + opentelemetry-proto==1.30.0
 + opentelemetry-sdk==1.30.0
 + opentelemetry-semantic-conventions==0.51b0
 + orjson==3.10.15
 + overrides==7.7.0
 + posthog==3.12.0
 + propcache==0.2.1
 + protobuf==5.29.3
 + psycopg2-binary==2.9.10
 + pyarrow==19.0.0
 + pyparsing==3.2.1
 + pypdf==5.3.0
 + rapidfuzz==3.12.1
 + redis==5.2.1
 + referencing==0.36.2
 + rpds-py==0.22.3
 + safetensors==0.5.2
 + scikit-learn==1.6.1
 + scipy==1.15.1
 + sentencepiece==0.2.0
 + starlette==0.45.3
 + tenacity==9.0.0
 + threadpoolctl==3.5.0
 + tokenizers==0.21.0
 + transformers==4.48.3
 + uvicorn==0.34.0
 + wrapt==1.17.2
 + xxhash==3.5.0
 + yarl==1.18.3
 + zipp==3.21.0
+ '[' -n 'sentence-transformers --no-deps#torch torchvision --index-url https://download.pytorch.org/whl/cpu' ']'
+ IFS='#'
+ read -ra parts
+ for part in '"${parts[@]}"'
+ echo 'sentence-transformers --no-deps'
sentence-transformers --no-deps
+ uv pip install sentence-transformers --no-deps
Using Python 3.13.1 environment at: llamastack-foo
Resolved 1 package in 141ms
Installed 1 package in 6ms
 + sentence-transformers==3.4.1
+ for part in '"${parts[@]}"'
+ echo 'torch torchvision --index-url https://download.pytorch.org/whl/cpu'
torch torchvision --index-url https://download.pytorch.org/whl/cpu
+ uv pip install torch torchvision --index-url https://download.pytorch.org/whl/cpu
Using Python 3.13.1 environment at: llamastack-foo
Resolved 13 packages in 2.15s
Installed 5 packages in 324ms
 + mpmath==1.3.0
 + networkx==3.3
 + sympy==1.13.1
 + torch==2.6.0
 + torchvision==0.21.0
Build Successful!
```

Run:

```
$ source llamastack-foo/bin/activate
$ INFERENCE_MODEL="meta-llama/Llama-3.2-3B-Instruct" OLLAMA_INFERENCE_MODEL="llama3.2:3b-instruct-fp16" python -m llama_stack.distribution.server.server --yaml-config ./llama_stack/templates/ollama/run.yaml --port 5001 
Using config file: llama_stack/templates/ollama/run.yaml
Run configuration:
apis:
- agents
- datasetio
- eval
- inference
- safety
- scoring
- telemetry
- tool_runtime
- vector_io
container_image: null
datasets: []
eval_tasks: []
image_name: ollama
metadata_store:
  db_path: /Users/leseb/.llama/distributions/ollama/registry.db
  namespace: null
  type: sqlite
models:
- metadata: {}
  model_id: meta-llama/Llama-3.2-3B-Instruct
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - llm
  provider_id: ollama
  provider_model_id: null
- metadata:
    embedding_dimension: 384
  model_id: all-MiniLM-L6-v2
  model_type: !!python/object/apply:llama_stack.apis.models.models.ModelType
  - embedding
  provider_id: sentence-transformers
  provider_model_id: null
providers:
  agents:
  - config:
      persistence_store:
        db_path: /Users/leseb/.llama/distributions/ollama/agents_store.db
        namespace: null
        type: sqlite
    provider_id: meta-reference
    provider_type: inline::meta-reference
  datasetio:
  - config: {}
    provider_id: huggingface
    provider_type: remote::huggingface
  - config: {}
    provider_id: localfs
    provider_type: inline::localfs
  eval:
  - config: {}
    provider_id: meta-reference
    provider_type: inline::meta-reference
  inference:
  - config:
      url: http://localhost:11434
    provider_id: ollama
    provider_type: remote::ollama
  - config: {}
    provider_id: sentence-transformers
    provider_type: inline::sentence-transformers
  safety:
  - config: {}
    provider_id: llama-guard
    provider_type: inline::llama-guard
  scoring:
  - config: {}
    provider_id: basic
    provider_type: inline::basic
  - config: {}
    provider_id: llm-as-judge
    provider_type: inline::llm-as-judge
  - config:
      openai_api_key: '********'
    provider_id: braintrust
    provider_type: inline::braintrust
  telemetry:
  - config:
      service_name: llama-stack
      sinks: console,sqlite
      sqlite_db_path: /Users/leseb/.llama/distributions/ollama/trace_store.db
    provider_id: meta-reference
    provider_type: inline::meta-reference
  tool_runtime:
  - config:
      api_key: '********'
      max_results: 3
    provider_id: brave-search
    provider_type: remote::brave-search
  - config:
      api_key: '********'
      max_results: 3
    provider_id: tavily-search
    provider_type: remote::tavily-search
  - config: {}
    provider_id: code-interpreter
    provider_type: inline::code-interpreter
  - config: {}
    provider_id: rag-runtime
    provider_type: inline::rag-runtime
  vector_io:
  - config:
      kvstore:
        db_path: /Users/leseb/.llama/distributions/ollama/faiss_store.db
        namespace: null
        type: sqlite
    provider_id: faiss
    provider_type: inline::faiss
scoring_fns: []
server:
  port: 8321
  tls_certfile: null
  tls_keyfile: null
shields: []
tool_groups:
- args: null
  mcp_endpoint: null
  provider_id: tavily-search
  toolgroup_id: builtin::websearch
- args: null
  mcp_endpoint: null
  provider_id: rag-runtime
  toolgroup_id: builtin::rag
- args: null
  mcp_endpoint: null
  provider_id: code-interpreter
  toolgroup_id: builtin::code_interpreter
vector_dbs: []
version: '2'

Warning: `bwrap` is not available. Code interpreter tool will not work correctly.
modules.json: 100%|███████████████████████████████████████████████████████████| 349/349 [00:00<00:00, 485kB/s]
config_sentence_transformers.json: 100%|██████████████████████████████████████| 116/116 [00:00<00:00, 498kB/s]
README.md: 100%|█████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 20.5MB/s]
sentence_bert_config.json: 100%|████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 583kB/s]
config.json: 100%|███████████████████████████████████████████████████████████| 612/612 [00:00<00:00, 4.63MB/s]
model.safetensors: 100%|█████████████████████████████████████████████████| 90.9M/90.9M [00:02<00:00, 36.6MB/s]
tokenizer_config.json: 100%|█████████████████████████████████████████████████| 350/350 [00:00<00:00, 4.27MB/s]
vocab.txt: 100%|███████████████████████████████████████████████████████████| 232k/232k [00:00<00:00, 1.90MB/s]
tokenizer.json: 100%|██████████████████████████████████████████████████████| 466k/466k [00:00<00:00, 2.23MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████████| 112/112 [00:00<00:00, 1.47MB/s]
1_Pooling/config.json: 100%|██████████████████████████████████████████████████| 190/190 [00:00<00:00, 841kB/s]
Serving API tool_groups
 GET /v1/tools/{tool_name}
 GET /v1/toolgroups/{toolgroup_id}
 GET /v1/toolgroups
 GET /v1/tools
 POST /v1/toolgroups
 DELETE /v1/toolgroups/{toolgroup_id}
Serving API tool_runtime
 POST /v1/tool-runtime/invoke
 GET /v1/tool-runtime/list-tools
 POST /v1/tool-runtime/rag-tool/insert
 POST /v1/tool-runtime/rag-tool/query
Serving API vector_io
 POST /v1/vector-io/insert
 POST /v1/vector-io/query
Serving API telemetry
 GET /v1/telemetry/traces/{trace_id}/spans/{span_id}
 GET /v1/telemetry/spans/{span_id}/tree
 GET /v1/telemetry/traces/{trace_id}
 POST /v1/telemetry/events
 GET /v1/telemetry/spans
 GET /v1/telemetry/traces
 POST /v1/telemetry/spans/export
Serving API models
 GET /v1/models/{model_id}
 GET /v1/models
 POST /v1/models
 DELETE /v1/models/{model_id}
Serving API eval
 POST /v1/eval/tasks/{task_id}/evaluations
 DELETE /v1/eval/tasks/{task_id}/jobs/{job_id}
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}/result
 GET /v1/eval/tasks/{task_id}/jobs/{job_id}
 POST /v1/eval/tasks/{task_id}/jobs
Serving API datasets
 GET /v1/datasets/{dataset_id}
 GET /v1/datasets
 POST /v1/datasets
 DELETE /v1/datasets/{dataset_id}
Serving API scoring_functions
 GET /v1/scoring-functions/{scoring_fn_id}
 GET /v1/scoring-functions
 POST /v1/scoring-functions
Serving API inspect
 GET /v1/health
 GET /v1/inspect/providers
 GET /v1/inspect/routes
 GET /v1/version
Serving API scoring
 POST /v1/scoring/score
 POST /v1/scoring/score-batch
Serving API shields
 GET /v1/shields/{identifier}
 GET /v1/shields
 POST /v1/shields
Serving API vector_dbs
 GET /v1/vector-dbs/{vector_db_id}
 GET /v1/vector-dbs
 POST /v1/vector-dbs
 DELETE /v1/vector-dbs/{vector_db_id}
Serving API eval_tasks
 GET /v1/eval-tasks/{eval_task_id}
 GET /v1/eval-tasks
 POST /v1/eval-tasks
Serving API agents
 POST /v1/agents
 POST /v1/agents/{agent_id}/session
 POST /v1/agents/{agent_id}/session/{session_id}/turn
 DELETE /v1/agents/{agent_id}
 DELETE /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}/step/{step_id}
 GET /v1/agents/{agent_id}/session/{session_id}/turn/{turn_id}
Serving API inference
 POST /v1/inference/chat-completion
 POST /v1/inference/completion
 POST /v1/inference/embeddings
Serving API datasetio
 POST /v1/datasetio/rows
 GET /v1/datasetio/rows
Serving API safety
 POST /v1/safety/run-shield

Listening on ['::', '0.0.0.0']:5001
INFO:     Started server process [39145]
INFO:     Waiting for application startup.
INFO:     ASGI 'lifespan' protocol appears unsupported.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://['::', '0.0.0.0']:5001 (Press CTRL+C to quit)
```

## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:22:03 -08:00
Yuan Tang
64328bfe62
fix: enable_session_persistence in AgentConfig should be optional (#1012)
# What does this PR do?
This issue was discovered in
https://github.com/meta-llama/llama-stack/pull/1009#discussion_r1947036518.

## Test Plan

This field is no longer required after the change.

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-14 09:19:53 -08:00
Ashwin Bharambe
314ee09ae3
chore: move all Llama Stack types from llama-models to llama-stack (#1098)
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
2025-02-14 09:10:59 -08:00
Sébastien Han
c0ee512980
build: configure ruff from pyproject.toml (#1100)
# What does this PR do?

- Remove hardcoded configurations from pre-commit.
- Allow configuration to be set via pyproject.toml.
- Merge .ruff.toml settings into pyproject.toml.
- Ensure the linter and formatter use the defined configuration instead
of being overridden by pre-commit.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-14 09:01:57 -08:00
raghotham
a3cb039e83
docs: Add region parameter to Bedrock provider (#1103)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)
2025-02-14 08:55:22 -08:00
Ben Browning
406465622e
fix: Update QdrantConfig to QdrantVectorIOConfig (#1104)
# What does this PR do?

This fixes an import introduced due to merging #1079 before #1039, and
thus the changes from #1039 needing to update `QdrantConfig` to
`QdrantVectorIOConfig`.


## Test Plan

I ran the remote vllm provider inference tests against the latest main:
```
VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote"
```

That failed with:
```
  File "/home/bbrownin/src/llama-stack/llama_stack/providers/tests/vector_io/fixtures.py", line 20, in <module>
    from llama_stack.providers.remote.vector_io.qdrant import QdrantConfig
ImportError: Error importing plugin "llama_stack.providers.tests.vector_io.fixtures": cannot import name 'QdrantConfig' from 'llama_stack.providers.remote.vector_io.qdrant' (/home/bbrownin/src/llama-stack/llama_stack/providers/remote/vector_io/qdrant/__init__.py)
```

After this change, the import no longer fails and the tests pass.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-14 06:31:00 -08:00
Reid
2f7268b790
fix: add the missed help description info (#1096) 2025-02-13 21:31:36 -08:00
Hardik Shah
b0b696cb4f
fix: regex pattern matching to support :path suffix in the routes (#1089)
This PR fixes client sdk test failure --
3720312204

by updating the regex matching pattern to also consider `:path` in the
routes
2025-02-13 18:18:23 -08:00
Xi Yan
da53dc3f5f
fix: openapi for eval-task (#1085)
# What does this PR do?
- as title

## Test Plan
- the deprecated endpoint need to obey what it was before

[//]: # (## Documentation)
2025-02-13 17:10:45 -08:00
Xi Yan
8b655e3cd2
fix!: update eval-tasks -> benchmarks (#1032)
# What does this PR do?

- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
- This change is backward compatible 
- Run notebook test with

```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```

<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>



[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 16:40:58 -08:00
Bill Murdock
32d1e50a6f
test: Add qdrant to provider tests (#1039)
# What does this PR do?

This is a follow on to #1022 . It includes the changes I needed to be
able to test the Qdrant support as requested by @terrytangyuan .

I uncovered a lot of bigger, more systemic issues with the vector DB
testing and I will open a new issue for those. For now, I am just
delivering the work I already did on that.

## Test Plan

As discussed on #1022:

```
podman pull qdrant/qdrant
mkdir qdrant-data
podman run -p 6333:6333 -v $(pwd)/qdrant-data:/qdrant/storage qdrant/qdrant
```


```
ollama pull all-minilm:l6-v2
curl http://localhost:11434/api/embeddings -d '{"model": "all-minilm", "prompt": "Hello world"}'
```

```
EMBEDDING_DIMENSION=384 QDRANT_URL=http://localhost pytest llama_stack/providers/tests/vector_io/test_vector_io.py -m "qdrant" -v -s --tb=short --embedding-model all-minilm:latest --disable-warnings
```

These show 3 tests passing and 15 deselected which is presumably working
as intended.

---------

Signed-off-by: Bill Murdock <bmurdock@redhat.com>
2025-02-13 15:44:55 -08:00
Yuan Tang
5858777ff0
fix: Update VectorIO config classes in registry (#1079)
This was missed in https://github.com/meta-llama/llama-stack/pull/1023. 

```
Traceback (most recent call last):
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 488, in <module>
    main()
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 389, in main
    impls = asyncio.run(construct_stack(config))
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/stack.py", line 202, in construct_stack
    impls = await resolve_impls(run_config, provider_registry or get_provider_registry(), dist_registry)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 230, in resolve_impls
    impl = await instantiate_provider(
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/resolver.py", line 312, in instantiate_provider
    config_type = instantiate_class_type(provider_spec.config_class)
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/utils/dynamic.py", line 13, in instantiate_class_type
    return getattr(module, class_name)
AttributeError: module 'llama_stack.providers.inline.vector_io.faiss' has no attribute 'FaissImplConfig'

```

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 15:39:13 -08:00
Yuan Tang
8ff27b58fa
chore: Consistent naming for VectorIO providers (#1023)
# What does this PR do?

This changes all VectorIO providers classes to follow the pattern
`<ProviderName>VectorIOConfig` and `<ProviderName>VectorIOAdapter`. All
API endpoints for VectorIOs are currently consistent with `/vector-io`.

Note that API endpoint for VectorDB stay unchanged as `/vector-dbs`. 

## Test Plan

I don't have a way to test all providers. This is a simple renaming so
things should work as expected.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-13 13:15:49 -05:00