# What does this PR do?
Added necessary dependencies to ensure successful execution of unit
tests. Without these, the following command would fail due to missing
imports:
```
uv run pytest -v -k "ollama" \
--inference-model=llama3.2:3b-instruct-fp16
llama_stack/providers/tests/inference/test_model_registration.py
```
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
Run:
```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py
```
You can observe that some tests pass while others fail, but the test
runs successfully.
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
I was encountering build issues when building my `ollama` environment
using `llama stack build`
```bash
llama stack build --template ollama --image-type venv
Traceback (most recent call last):
File "/Users/farceo/dev/llama-stack/.venv/bin/llama", line 10, in <module>
sys.exit(main())
^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 46, in main
parser.run(args)
File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 40, in run
args.func(args)
File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/build.py", line 77, in _run_stack_build_command
return run_stack_build_command(args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 180, in run_stack_build_command
_run_stack_build_command_from_build_config(
File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 272, in _run_stack_build_command_from_build_config
return_code = build_image(
^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/distribution/build.py", line 137, in build_image
return_code = run_with_pty(args)
^^^^^^^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 22, in run_with_pty
return _run_with_pty_unix(command)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 53, in _run_with_pty_unix
process = subprocess.Popen(
^^^^^^^^^^^^^^^^^
File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1026, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1950, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/farceo/dev/llama-stack/llama_stack/distribution/build_venv.sh'
make: *** [build-ollama] Error 1
```
I also had to adjust the script when testing the `common.sh` file
because it returned:
```bash
> source llama_stack/distribution/common.sh
llama_stack/distribution/common.sh:6: command not found: ^M
llama_stack/distribution/common.sh:50: parse error near `\n'
```
On my branch, I ran:
```bash
sed -i '' 's/\r$//' llama_stack/distribution/common.sh
```
And then I was able to successfully build the environment.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
N/A
[//]: # (## Documentation)
N/A
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
```
$ llama model list --help
usage: llama model list [-h] [--show-all] [--downloaded]
Show available llama models
options:
-h, --help show this help message and exit
--show-all Show all models (not just defaults)
--downloaded List the downloaded models
$ llama model list --downloaded
+-------------+----------+---------------------+
| Model | Size | Modified Time |
+-------------+----------+---------------------+
| Llama3.2-1B | 2.31 GB | 2025-02-16 13:38:04 |
+-------------+----------+---------------------+
| Llama3.1-8B | 14.97 GB | 2025-02-16 10:36:37 |
+-------------+----------+---------------------+
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
Clarify when to update documentation, explain `uv sync --extra dev` and
OpenAPI generation, and specify where generated docs are stored.
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
## What does this PR do?
In this PR, we implement a passthrough inference provider that works for
any endpoints that respect llama stack inference API definition.
## Test Plan
config some endpoint that respect llama stack inference API definition
and got the inference results successfully
<img width="1268" alt="Screenshot 2025-02-19 at 8 52 51 PM"
src="https://github.com/user-attachments/assets/447816e4-ea7a-4365-b90c-386dc7dcf4a1"
/>
as title, to let scoring function llm_as_judge_405b_simpleqa output
aggregated_results.
We can leverage categorical_count to calculate the % of correctness as
eval benchmark metrics
- **refactor: simplify job status extraction a bit**
- **torchtune: save job status on schedule**
- **refactor: get rid of job_list in torchtune job management code**
# What does this PR do?
A failed job is now registered in API, and one can consult its status.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
$ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73
JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688))
```
[//]: # (## Documentation)
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
# What does this PR do?
1. This PR adds batch inserts into sqlite-vec as requested in
https://github.com/meta-llama/llama-stack/pull/1040
- Note: the inserts uses a uuid generated from the hash of the document
id and chunk content.
2. This PR also adds unit tests for sqlite-vec. In a follow up PR, I can
add similar tests to Faiss.
## Test Plan
1. Integration tests:
```python
INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py
...
PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED
```
3. Unit tests:
```python
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py -v -s --tb=short --disable-warnings --asyncio-mode=auto
...
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```
I also tested using the same example RAG script in
https://github.com/meta-llama/llama-stack/pull/1040 and received the
output.
---------
Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
# What does this PR do?
It seems that the llama_stack_client repo and the main repo were
originally the same, causing links to point to local references. We’ve
now updated them to use the correct llama_stack_client repo links.
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
There should be a choke-point for llama3.api imports -- this is the
prompt adapter. Creating a ChatFormat() object on demand is inexpensive.
The underlying Tokenizer is a singleton anyway.
# What does this PR do?
Before this change, `distro_codegen.py` would only work if the user
manually installed multiple provider-specific dependencies (see #1122).
Now, users can run `distro_codegen.py` without any provider-specific
dependencies because we avoid importing the entire provider
implementations just to get the config needed to build the provider
template.
Concretely, this mostly means moving the
MODEL_ALIASES (and related variants) definitions to a new models.py
class within the provider implementation for those providers that
require additional dependencies. It also meant moving a couple of
imports from top-level imports to inside `get_adapter_impl` for some
providers, which follows the pattern used by multiple existing
providers.
To ensure we don't regress and accidentally add new imports that cause
distro_codegen.py to fail, the stubbed-in pre-commit hook for
distro_codegen.py was uncommented and slightly tweaked to run via `uv
run python ...` to ensure it runs with only the project's default
dependencies and to run automatically instead of manually.
Lastly, this updates distro_codegen.py itself to keep track of paths it
might have changed and to only `git diff` those specific paths when
checking for changed files instead of doing a diff on the entire working
tree. The latter was overly broad and would require a user have no other
unstaged changes in their working tree, even if those unstaged changes
were unrelated to generated code. Now it only flags uncommitted changes
for paths distro_codegen.py actually writes to.
Our generated code was also out-of-date, presumably because of these
issues, so this commit also has some updates to the generated code
purely because it was out of sync, and the pre-commit hook now enforces
things to be updated.
(Closes#1122)
## Test Plan
I manually tested distro_codegen.py and the pre-commit hook to verify
those work as expected, flagging any uncommited changes and catching any
imports that attempt to pull in provider-specific dependencies.
However, I do not have valid api keys to the impacted provider
implementations, and am unable to easily run the inference tests against
each changed provider. There are no functional changes to the provider
implementations here, but I'd appreciate a second set of eyes on the
changed import statements and moving of MODEL_ALIASES type code to a
separate models.py to ensure I didn't make any obvious errors.
---------
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
# What does this PR do?
Add provider_id to avoid errors using the rag example with
llama_stack_client
`llama_stack_client.BadRequestError: Error code: 400 - {'detail':
'Invalid value: No provider specified and multiple providers available.
Please specify a provider_id.'}`
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Co-authored-by: Xi Yan <yanxi970830@gmail.com>
# What does this PR do?
- Closes#1142
- Root cause is due to having `Union[str, AgenToolGroupWithArgs]`
## Test Plan
- Test with script described in issue.
- Print out final converted pydantic object
<img width="1470" alt="image"
src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6"
/>
[//]: # (## Documentation)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Re-check and based on the doc, the download model id, actually is model
descriptor(also without `meta-llama/`).
https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html
```
$ llama download --source huggingface --model-id Llama-Guard-3-1B:int4 --hf-token xxx # model descriptor
Fetching 8 files: 0%| | 0/8 [00:00<?, ?it/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s]
$ llama download --source huggingface --model-id Llama-Guard-3-1B-INT4 --hf-token xxxx # hugging face repo without meta-llama/
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
[--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<---
$ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
[--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found
$ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8
Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last):
$ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL]
[--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Based on the cade:
6b1773d530/llama_stack/cli/download.py (L454)
and the test, it can use comma to specify multiple model ids. So update
the usage.
```
$ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B
Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00
Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00
Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00
Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
[Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B
Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00
Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00
Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00
Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 6.4/6.4 GB - 0:00:00
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B
$ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B
original%2Fparams.json: 100%|██████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 564kB/
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
...
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 9.18MB/s]
Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B
before:
$ llama model download --help
--model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models
after:
$ llama model download --help
--model-id MODEL_ID See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Based on the code
6b1773d530/llama_stack/cli/download.py (L379)
and test, `verify-download` should only use in `downloaded from Meta`.
```
test: no checklist.chk file for hf download
$ llama model download --source meta --model-id Llama3.2-1B
Downloading checklist.chk ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 156/156 bytes - 0:00:00
Downloading tokenizer.model ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.2/2.2 MB - 0:00:00
Downloading params.json ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 220/220 bytes - 0:00:00
Downloading consolidated.00.pth ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100.0% 2.5/2.5 GB - 0:00:00
before:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID
Verify the downloaded checkpoints' checksums
options:
-h, --help show this help message and exit
--model-id MODEL_ID Model ID to verify
after:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID
Verify the downloaded checkpoints' checksums for models downloaded from Meta
options:
-h, --help show this help message and exit
--model-id MODEL_ID Model ID to verify (only for models downloaded from Meta)
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
From the code and the usage, seems cannot see that need to use
`--no-list-templates` to handle, and also make the user confused from
the help text, so try to remove it.
```
$ llama stack build --no-list-templates
> Enter a name for your Llama Stack (e.g. my-local-stack):
$ llama stack build
> Enter a name for your Llama Stack (e.g. my-local-stack):
before:
$ llama stack build --help
--list-templates, --no-list-templates
Show the available templates for building a Llama Stack distribution (default: False)
after:
--list-templates Show the available templates for building a Llama Stack distribution
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
This fixes an issue when running the e2e agent example:
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py
```
| File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 175, in _process_vllm_chat_completion_stream_response
| tool_call = convert_tool_call(choice.delta.tool_calls[0])
| File "/home/yutang/repos/llama-stack/llama_stack/providers/utils/inference/openai_compat.py", line 441, in convert_tool_call
| return ToolCall(
| File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/site-packages/pydantic/main.py", line 214, in __init__
| validated_self = self.__pydantic_validator__.validate_python(data, self_instance=self)
| pydantic_core._pydantic_core.ValidationError: 4 validation errors for ToolCall
| call_id
| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
| For further information visit https://errors.pydantic.dev/2.10/v/string_type
| tool_name.enum[BuiltinTool]
| Input should be 'brave_search', 'wolfram_alpha', 'photogen' or 'code_interpreter' [type=enum, input_value=None, input_type=NoneType]
| For further information visit https://errors.pydantic.dev/2.10/v/enum
| tool_name.str
| Input should be a valid string [type=string_type, input_value=None, input_type=NoneType]
| For further information visit https://errors.pydantic.dev/2.10/v/string_type
| arguments
| Input should be a valid dictionary [type=dict_type, input_value=202, input_type=int]
| For further information visit https://errors.pydantic.dev/2.10/v/dict_type
```
This issue happened because not all arguments have been appended to the
tool call buffer yet. The current code assumes that we are ready to
convert the tool call whenever args can be converted to JSON
successfully. In this case, `json.loads("202")` would succeed but the
rest of the arguments have not been properly parsed yet.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
The e2e example worked successfully (although note that I ran the script
twice with each function call separately due to
https://github.com/meta-llama/llama-stack/issues/1120):
```
tool_execution> Tool:get_ticker_data Args:{'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'}
tool_execution> Tool:get_ticker_data Response:"[{\"('Year', '')\":2023,\"('Close', 'GOOG')\":140.4254455566}]"
tool_execution> Tool:web_search Args:{'query': '42nd president of the United States'}
tool_execution> Tool:web_search Response:"{\"query\": \"42nd president of the United States\", \"top_k\": [{\"title\": \"William J. Clinton | whitehouse.gov\", \"url\": \"https://obamawhitehouse.archives.gov/1600/presidents/williamjclinton\", \"description\": \"<strong>Bill Clinton</strong> is an American politician from Arkansas who served as the 42nd President of the United States (1993-2001). He took office at the end of the Cold War, and was the first baby-boomer generation President.\", \"type\": \"search_result\"}, {\"title\": \"Bill Clinton - Wikipedia\", \"url\": \"https://en.wikipedia.org/wiki/Bill_Clinton\", \"description\": \"<strong>William Jefferson Clinton</strong> (n\\u00e9 Blythe; born August 19, 1946) is an American politician and lawyer who served as the 42nd president of the United States from 1993 to 2001. A member of the Democratic Party, he previously served as the attorney general of Arkansas from 1977 to 1979 and as the ...\", \"type\": \"search_result\"}, [{\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=eR2z_1-v87Y\", \"title\": \"A Conversation with Bill Clinton, 42nd President of the United ...\", \"description\": \"William Jefferson Clinton, the first Democratic president in six decades to be elected twice, led the United States to the longest economic expansion in Amer...\"}, {\"type\": \"video_result\", \"url\": \"4484174096/\", \"title\": \"January 20, 1993, President Clinton was sworn in as the 42nd ...\", \"description\": \"WATCH: On January 20, 1993, President Bill Clinton was sworn in as the 42nd President of the United States. #InaugurationDay Video courtesy of the...\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared thoughts ...\", \"description\": \"AboutPressCopyrightContact usCreatorsAdvertiseDevelopersTermsPrivacyPolicy & SafetyHow YouTube worksTest new features \\u00b7 \\u00a9 2024 Google LLC\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/shorts/vI0HGQqEJh0\", \"title\": \"42nd President of the United States, Bill Clinton, shared ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}, {\"type\": \"video_result\", \"url\": \"https://www.youtube.com/watch?v=PHihhihVth0\", \"title\": \"Bill & Hillary Clinton returning to Little Rock for 20th ...\", \"description\": \"Enjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.\"}]]}"
```
All text inference tests passed.
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
```
before:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
[--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
config
start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
After:
$ llama stack run --help
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--disable-ipv6] [--env KEY=VALUE]
[--tls-keyfile TLS_KEYFILE] [--tls-certfile TLS_CERTFILE] [--image-type {conda,container,venv}]
config
Start <<<<<<---- the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
The bot just updated the project to 0.1.3 in
https://github.com/meta-llama/llama-stack/commits?author=github-actions%5Bbot%5D
but the deps need to be synced.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Remove the empty line from help
```
before:
$ llama model download --help
--max-parallel MAX_PARALLEL
Maximum number of concurrent downloads
--ignore-patterns IGNORE_PATTERNS
<<<<<<<<<empty line>>>>>>>>>>
For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
safetensors files to avoid downloading duplicate weights.
after:
$ llama model download --help
--max-parallel MAX_PARALLEL
Maximum number of concurrent downloads
--ignore-patterns IGNORE_PATTERNS
For source=huggingface, files matching any of the patterns are not downloaded. Defaults to ignoring
safetensors files to avoid downloading duplicate weights.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
This issue was discovered in
https://github.com/meta-llama/llama-stack/pull/1009#discussion_r1947036518.
## Test Plan
This field is no longer required after the change.
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.
This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279
## Test Plan
Ensure all `llama` CLI `model` sub-commands work:
```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```
Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```
Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs
Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.
```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
# What does this PR do?
- Remove hardcoded configurations from pre-commit.
- Allow configuration to be set via pyproject.toml.
- Merge .ruff.toml settings into pyproject.toml.
- Ensure the linter and formatter use the defined configuration instead
of being overridden by pre-commit.
Signed-off-by: Sébastien Han <seb@redhat.com>
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Sébastien Han <seb@redhat.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
# What does this PR do?
This fixes an import introduced due to merging #1079 before #1039, and
thus the changes from #1039 needing to update `QdrantConfig` to
`QdrantVectorIOConfig`.
## Test Plan
I ran the remote vllm provider inference tests against the latest main:
```
VLLM_URL="http://localhost:8001/v1" python -m pytest -s -v llama_stack/providers/tests/inference/test_text_inference.py --providers "inference=vllm_remote"
```
That failed with:
```
File "/home/bbrownin/src/llama-stack/llama_stack/providers/tests/vector_io/fixtures.py", line 20, in <module>
from llama_stack.providers.remote.vector_io.qdrant import QdrantConfig
ImportError: Error importing plugin "llama_stack.providers.tests.vector_io.fixtures": cannot import name 'QdrantConfig' from 'llama_stack.providers.remote.vector_io.qdrant' (/home/bbrownin/src/llama-stack/llama_stack/providers/remote/vector_io/qdrant/__init__.py)
```
After this change, the import no longer fails and the tests pass.
Signed-off-by: Ben Browning <bbrownin@redhat.com>
# What does this PR do?
- sqlite_vec not added to all template yet, disable test for now to
unblock release cut
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
<img width="846" alt="image"
src="https://github.com/user-attachments/assets/fa896497-f37c-4cdf-bc62-21893afbd392"
/>
[//]: # (## Documentation)
# What does this PR do?
- Update `/eval-tasks` to `/benchmarks`
- ⚠️ Remove differentiation between `app` v.s. `benchmark` eval task
config. Now we only have `BenchmarkConfig`. The overloaded `benchmark`
is confusing and do not add any value. Backward compatibility is being
kept as the "type" is not being used anywhere.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- This change is backward compatible
- Run notebook test with
```
pytest -v -s --nbval-lax ./docs/getting_started.ipynb
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
<img width="846" alt="image"
src="https://github.com/user-attachments/assets/d2fc06a7-593a-444f-bc1f-10ab9b0c843d"
/>
[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)
---------
Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Signed-off-by: Ben Browning <bbrownin@redhat.com>
Signed-off-by: Sébastien Han <seb@redhat.com>
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
Co-authored-by: Ben Browning <ben324@gmail.com>
Co-authored-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Reid <61492567+reidliu41@users.noreply.github.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Co-authored-by: Yuan Tang <terrytangyuan@gmail.com>