Commit graph

1205 commits

Author SHA1 Message Date
Ashwin Bharambe
9436dd570d
feat: register embedding models for ollama, together, fireworks (#1190)
# What does this PR do?

We have support for embeddings in our Inference providers, but so far we
haven't done the final step of actually registering the known embedding
models and making sure they are extremely easy to use. This is one step
towards that.

## Test Plan

Run existing inference tests.

```bash

$ cd llama_stack/providers/tests/inference
$ pytest -s -v -k fireworks test_embeddings.py \
   --inference-model nomic-ai/nomic-embed-text-v1.5 --env EMBEDDING_DIMENSION=784
$  pytest -s -v -k together test_embeddings.py \
   --inference-model togethercomputer/m2-bert-80M-8k-retrieval --env EMBEDDING_DIMENSION=784
$ pytest -s -v -k ollama test_embeddings.py \
   --inference-model all-minilm:latest --env EMBEDDING_DIMENSION=784
```

The value of the EMBEDDING_DIMENSION isn't actually used in these tests,
it is merely used by the test fixtures to check if the model is an LLM
or Embedding.
2025-02-20 15:39:08 -08:00
Ashwin Bharambe
736560ceba Remove os.getenv() from ollama config 2025-02-20 14:30:32 -08:00
LESSuseLESS
2cbe9395b0
feat: D69478008 [llama-stack] turning tests into data-driven (#1180)
# What does this PR do?

We have several places running tests for different purposes.
- oss llama stack
  - provider tests
  - e2e tests
- provider llama stack
  - unit tests
  - e2e tests

It would be nice if they can *share the same set of test data*, so we
maintain the consistency between spec and implementation. This is what
this diff is about, isolating test data from test coding, so that we can
reuse the same data at different places by writing different test
coding.

## Test Plan

== Set up Ollama local server  
==  Run a provider test
conda activate stack

OLLAMA_URL="http://localhost:8321" \
pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output
// test_structured_output should also work

== Run an e2e test
conda activate sherpa
with-proxy pip install llama-stack
export INFERENCE_MODEL=llama3.2:3b-instruct-fp16
export LLAMA_STACK_PORT=8322
with-proxy llama stack build --template ollama
with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama
  - Run test client,
LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \
pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \

tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output
// test_text_chat_completion_structured_output should also work

## Notes

- This PR was automatically generated by oss_sync
- Please refer to D69478008 for more details.
2025-02-20 14:13:06 -08:00
ehhuang
1166afdf76
fix: some telemetry APIs don't currently work (#1188)
Summary:

This bug is surfaced by using the http LS client. The issue is that
non-scalar values in 'GET' method are `body` params in fastAPI, but our
spec generation script doesn't respect that. We fix by just making them
POST method instead.

Test Plan:
Test API call with newly sync'd client
(https://github.com/meta-llama/llama-stack-client-python/pull/149)

<img width="1114" alt="image"
src="https://github.com/user-attachments/assets/7710aca5-d163-4e00-a465-14e6fcaac2b2"
/>
2025-02-20 14:09:25 -08:00
Xi Yan
ea1faae50e
chore!: deprecate eval/tasks (#1186)
# What does this PR do?
- Fully deprecate eval/tasks

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1088 

NOTE: this will be a breaking change. We have introduced the new API in
0.1.3 .

Notebook has been updated to use the new endpoints.

## Test Plan
```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb 
```
<img width="611" alt="image"
src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f"
/>



cc @SLR722  for awareness

[//]: # (## Documentation)
2025-02-20 14:06:21 -08:00
Ashwin Bharambe
07ccf908f7 ModelAlias -> ProviderModelEntry 2025-02-20 14:02:36 -08:00
Kevin Cogan
561295af76
docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script (#1129)
# What does this PR do?
This PR improves the documentation in several ways:

- **Fixed incorrect link in `tools.md`** to ensure all references point
to the correct resources.
- **Added instructions for running the `code-interpreter` agent in a
Podman container**, helping users configure and execute the tool in
containerized environments.
- **Introduced an unregister command for single and multiple vector
databases**, making it easier to manage vector DBs.
- **Provided a simple example script for using the `code-interpreter`
agent**, giving users a practical reference for implementation.

These updates enhance the clarity, usability, and completeness of the
documentation.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
The following steps were performed to verify the accuracy of the
changes:

1. **Validated all fixed link** by checking their destinations to ensure
correctness.
2. **Ran the `code-interpreter` agent in a Podman container** following
the new instructions to confirm functionality.
3. **Executed the vector database unregister commands** and verified
that both single and multiple databases were correctly removed.
4. **Tested the new example script for `code-interpreter`**, ensuring it
runs without errors.

All changes were reviewed and tested successfully, improving the
documentation's accuracy and ease of use.

[//]: # (## Documentation)
2025-02-20 13:52:14 -08:00
Vladimir Ivić
f7161611c6
feat: adding endpoints for files and uploads (#1070)
Summary:
Adds spec definitions for file uploads operations.

This API focuses around two high level operations:
* Initiating and managing upload session
* Accessing uploaded file information

Usage examples:

To start a file upload session:
```
curl -X POST https://localhost:8321/v1/files \
-d '{
   "key": "image123.jpg',
   "bucket": "images",
   "mime_type": "image/jpg",
   "size": 12345
}'

# Returns
{
  “id”: <session_id>
  “url”: “https://localhost:8321/v1/files/session:<session_id>”,
  "offset": 0,
  "size": 12345
}

```

To upload file content to an existing session
```
curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \
  --data-binary @<path_to_local_file>

# Returns
{
  "key": "image123.jpg",
  "bucket": "images",
  "mime_type": "image/jpg",
  "bytes": 12345,
  "created_at": 1737492240
}

# Implementing on server side (Flask example for simplicity):
@app.route('/uploads/{upload_id}', methods=['POST'])
def upload_content_to_session(upload_id):
    try:
        # Get the binary file data from the request body
        file_data = request.data

        # Save the file to disk
        save_path = f"./uploads/{upload_id}"
        with open(save_path, 'wb') as f:
            f.write(file_data)
        return {__uploaded_file_json__}, 200
    except Exception as e:
        return 500

```

To read information about an existing upload session
```
curl -i -X GET "https://localhost:8321/v1/files/session:<session_id>

# Returns
{
  “id”: <session_id>
  “url”: “https://localhost:8321/v1/files/session:<session_id>”,
  "offset": 1024,
  "size": 12345
}
```

To list buckets
```
GET /files

# Returns
{
  "data": [
     {"name": "bucket1"},
     {"name": "bucket2"},
   ]
}
```

To list all files in a bucket
```
GET /files/{bucket}

# Returns
{
  "data": [
    {
      "key": "shiba.jpg",
      "bucket": "dogs",
      "mime_type": "image/jpg",
      "bytes": 82334,
      "created_at": 1737492240,
    },
    {
      "key": "persian_cat.jpg",
      "mime_type": "image/jpg",
      "bucket": "cats",
      "bytes": 39924,
      "created_at": 1727493440,
    },
  ]
}
```

To get specific file info
```
GET /files/{bucket}/{key}

{
  "key": "shiba.jpg",
  "bucket": "dogs",
  "mime_type": "image/jpg",
  "bytes": 82334,
  "created_at": 1737492240,
}

```

To delete specific file
```
DELETE /files/{bucket}/{key}

{
  "key": "shiba.jpg",
  "bucket": "dogs",
  "mime_type": "image/jpg",
  "bytes": 82334,
  "created_at": 1737492240,
}

```
2025-02-20 13:09:00 -08:00
Ashwin Bharambe
eddef0b2ae
chore: slight renaming of model alias stuff (#1181)
Quick test by running:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk
```
2025-02-20 11:48:46 -08:00
Ashwin Bharambe
2eda050aef Fix ollama fixture 2025-02-20 11:46:02 -08:00
Ashwin Bharambe
3d891fc9ba ModelAlias cleanup 2025-02-20 11:44:39 -08:00
Ben Browning
fbec826883
docs: Add note about distro_codegen.py and provider dependencies (#1175)
# What does this PR do?

This expands upon the existing distro_codegen.py text in the new API
provider documentation to include a note about not including
provider-specific dependencies in the code path that builds the
distribution's template.

Our distro_codegen pre-commit hook will catch this case anyway, but this
attempts to inform provider authors ahead of time about that.

## Test Plan

I built the docs website locally via the following:
```
pip install docs/requirements.txt
sphinx-build -M html docs/source docs_output
```
Then, I opened that newly generated
`docs_output/html/contributing/new_api_provider.html` in my browser and
confirmed everything rendered correctly.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-20 09:23:46 -08:00
Ashwin Bharambe
984a8039ad Kill unnecessary check on --safety-shield test param 2025-02-20 09:15:23 -08:00
Rashmi Pawar
996f27a308
fix: add logging import (#1174)
# What does this PR do?
Fixes logging import and the logger instance creation

cc: @dglogo
2025-02-20 11:26:47 -05:00
Ihar Hrachyshka
fb6a3efb1d
feat: Enable CPU training for torchtune (#1140)
# What does this PR do?

You are now able to run a training cycle on CPU. This is useful for
debugging and testing purposes.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

On a Mac machine without CUDA devices:

```
17:00:24.417 [START] /v1/post-training/supervised-fine-tune
DEBUG 2025-02-18 12:00:24,419 torchtune.utils._logging:60: Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0
INFO 2025-02-18 12:00:24,463 torchtune.utils._logging:64: Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
INFO 2025-02-18 12:00:46,699 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:182: Model is initialized with precision torch.bfloat16.
INFO 2025-02-18 12:00:46,784 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:185: Tokenizer is initialized.
INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:188: Optimizer is initialized.
INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:192: Loss is initialized.
INFO 2025-02-18 12:00:48,997 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:209: Dataset and Sampler are initialized.
INFO 2025-02-18 12:00:48,998 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:227: Learning rate scheduler is initialized.
Writing logs to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/log_1739898049.txt
1|1|Loss: 1.7414989471435547: 100% 1/1 [03:46<00:00, 226.21s/it]INFO 2025-02-18 12:04:35,227 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:528: Starting checkpoint save...
INFO 2025-02-18 12:04:49,974 torchtune.utils._logging:121: Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
INFO 2025-02-18 12:04:49,981 torchtune.utils._logging:132: Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
model_file_path /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0
1|1|Loss: 1.7414989471435547: 100% 1/1 [04:01<00:00, 241.18s/it]
INFO:     ::1:64990 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK
17:04:50.364 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (265947.01ms)
 17:00:24.419 [DEBUG] Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0
 17:00:24.463 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
 17:00:46.700 [INFO] Model is initialized with precision torch.bfloat16.
 17:00:46.784 [INFO] Tokenizer is initialized.
 17:00:46.786 [INFO] Optimizer is initialized.
 17:00:46.786 [INFO] Loss is initialized.
 17:00:48.997 [INFO] Dataset and Sampler are initialized.
 17:00:48.998 [INFO] Learning rate scheduler is initialized.
 17:04:35.227 [INFO] Starting checkpoint save...
 17:04:49.974 [INFO] Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
 17:04:49.981 [INFO] Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
```

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 22:42:58 -08:00
Xi Yan
a324ceb9a9 precommit again 2025-02-19 22:40:45 -08:00
Sébastien Han
4694780d23
test: skip model registration for unsupported providers (#1030)
# What does this PR do?
- Updated `test_register_with_llama_model` to skip tests when using the
Ollama provider, as it does not support custom model names.
- Delete `test_initialize_model_during_registering` since there is no
  "load_model" semantic that is exposed publicly on a provider.

These changes ensure that tests do not fail for providers with
incompatible behaviors.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run Ollama:

```
 uv run pytest -v -s -k "ollama" llama_stack/providers/tests/inference/test_model_registration.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================== test session starts ==========================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 65 items / 60 deselected / 5 selected                                                         

llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] SKIPPED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED

======================== 3 passed, 1 skipped, 60 deselected, 2 warnings in 0.22s ========================
```


[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-19 22:39:13 -08:00
Sixian Yi
531940aea9
script for running client sdk tests (#895)
# What does this PR do?
Create a script for running all client-sdk tests on Async Library
client, with the option to generate report


## Test Plan

```
python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report
```



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-19 22:38:06 -08:00
Xi Yan
a3d8c49459 precommit 2025-02-19 22:37:41 -08:00
Xi Yan
ce040ad111 precommit 2025-02-19 22:35:24 -08:00
Xi Yan
ca687d3e86 style: env var in build_venv 2025-02-19 22:32:59 -08:00
Shrinit Goyal
b74f25035c
Added support for mongoDB KV store (#543)
Added the support for mongoDB as KV store
validated in mongodb, it is able to store agent data, session data and
turn data
<img width="1332" alt="image"
src="https://github.com/user-attachments/assets/867700a4-b9ee-4a3c-8278-f39074d39d56">
this is how run.yaml would look:
```
    config:
      persistence_store:
        type: mongodb
        namespace: null
        host: localhost
        port: 27017
        db: llamastack
        user: ""
        password: ""
        collection_name: llamastack_kvstore
```

---------

Co-authored-by: shrinitgoyal <shrinit.goyal@engati.com>
2025-02-19 22:30:50 -08:00
Yuan Tang
5966079770
fix: More robust handling of the arguments in tool call response in remote::vllm (#1169)
# What does this PR do?

This fixes the following issue on the server side when the tool call
response contains empty args. This happens when running
`examples.agents.e2e_loop_with_client_tools` but `get_ticker_data`
returns `[]`:

```
Traceback (most recent call last):
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/server/server.py", line 208, in sse_generator
    async for item in event_gen:
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agents.py", line 169, in _create_agent_turn_streaming
    async for event in agent.create_and_execute_turn(request):
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 189, in create_and_execute_turn
    async for chunk in self.run(
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 258, in run
    async for res in self._run(
  File "/home/yutang/repos/llama-stack/llama_stack/providers/inline/agents/meta_reference/agent_instance.py", line 499, in _run
    async for chunk in await self.inference_api.chat_completion(
  File "/home/yutang/repos/llama-stack/llama_stack/distribution/routers/routers.py", line 182, in <genexpr>
    return (chunk async for chunk in await provider.chat_completion(**params))
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 296, in _stream_chat_completion
    async for chunk in res:
  File "/home/yutang/repos/llama-stack/llama_stack/providers/remote/inference/vllm/vllm.py", line 162, in _process_vllm_chat_completion_stream_response
    arguments=json.loads(tool_call_buf.arguments),
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/home/yutang/.conda/envs/distribution-myenv/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
```
## Test Plan

All existing tests in
`tests/client-sdk/inference/test_text_inference.py` passed.

[//]: # (## Documentation)

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-19 22:27:02 -08:00
Sébastien Han
69eebaf5bf
build: add missing dev dependencies for unit tests (#1004)
# What does this PR do?
Added necessary dependencies to ensure successful execution of unit
tests. Without these, the following command would fail due to missing
imports:

```
uv run pytest -v -k "ollama" \
     --inference-model=llama3.2:3b-instruct-fp16
     llama_stack/providers/tests/inference/test_model_registration.py
```

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
Run:

```
ollama run llama3.2:3b-instruct-fp16 --keepalive 2m &
uv run pytest -v -k "ollama" --inference-model=llama3.2:3b-instruct-fp16 llama_stack/providers/tests/inference/test_model_registration.py

```

You can observe that some tests pass while others fail, but the test
runs successfully.

[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 22:26:11 -08:00
Xi Yan
61f43b8677
fix: llama stack build use UV_SYSTEM_PYTHON to install dependencies to system environment (#1163)
# What does this PR do?
- resolves issue: #1159 
- Root cause: https://github.com/meta-llama/llama-stack/pull/980 forces
`build_venv.sh` to install in a venv environment, which do not work on
Colab notebook environment

<img width="1004" alt="image"
src="https://github.com/user-attachments/assets/1f9be409-5313-4926-b078-74e141cf29eb"
/>

## This PR
Use `UV_SYSTEM_PYTHON` to make sure dependencies are installed in
current system environment. Which will be used in the Colab environment.
```
UV_SYSTEM_PYTHON=1 llama stack build --template together --image-type venv
```

## Test Plan
- Works in Colab environment
<img width="621" alt="image"
src="https://github.com/user-attachments/assets/ae93bc3d-e05a-44b9-bb21-fb88f29969b8"
/>
2025-02-19 22:21:16 -08:00
Francisco Arceo
2b752df79a
fix: Fixing some small issues with the build scripts (#1132)
# What does this PR do?
I was encountering build issues when building my `ollama` environment
using `llama stack build`

```bash
llama stack build --template ollama --image-type venv
Traceback (most recent call last):
  File "/Users/farceo/dev/llama-stack/.venv/bin/llama", line 10, in <module>
    sys.exit(main())
             ^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 46, in main
    parser.run(args)
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/llama.py", line 40, in run
    args.func(args)
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/build.py", line 77, in _run_stack_build_command
    return run_stack_build_command(args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 180, in run_stack_build_command
    _run_stack_build_command_from_build_config(
  File "/Users/farceo/dev/llama-stack/llama_stack/cli/stack/_build.py", line 272, in _run_stack_build_command_from_build_config
    return_code = build_image(
                  ^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/build.py", line 137, in build_image
    return_code = run_with_pty(args)
                  ^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 22, in run_with_pty
    return _run_with_pty_unix(command)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/farceo/dev/llama-stack/llama_stack/distribution/utils/exec.py", line 53, in _run_with_pty_unix
    process = subprocess.Popen(
              ^^^^^^^^^^^^^^^^^
  File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1026, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "/Users/farceo/.local/share/uv/python/cpython-3.11.6-macos-aarch64-none/lib/python3.11/subprocess.py", line 1950, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: '/Users/farceo/dev/llama-stack/llama_stack/distribution/build_venv.sh'
make: *** [build-ollama] Error 1
```

I also had to adjust the script when testing the `common.sh` file
because it returned:

```bash
> source llama_stack/distribution/common.sh
llama_stack/distribution/common.sh:6: command not found: ^M
llama_stack/distribution/common.sh:50: parse error near `\n'
```
On my branch, I ran:
```bash
sed -i '' 's/\r$//' llama_stack/distribution/common.sh
```
And then I was able to successfully build the environment.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
N/A

[//]: # (## Documentation)
N/A

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-19 22:20:49 -08:00
Reid
af377e844d
feat: add a option to list the downloaded models (#1127)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
$ llama model list --help
usage: llama model list [-h] [--show-all] [--downloaded]

Show available llama models

options:
  -h, --help            show this help message and exit
  --show-all            Show all models (not just defaults)
  --downloaded          List the downloaded models

$ llama model list --downloaded
+-------------+----------+---------------------+
| Model       | Size     | Modified Time       |
+-------------+----------+---------------------+
| Llama3.2-1B | 2.31 GB  | 2025-02-16 13:38:04 |
+-------------+----------+---------------------+
| Llama3.1-8B | 14.97 GB | 2025-02-16 10:36:37 |
+-------------+----------+---------------------+
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-19 22:17:39 -08:00
Sébastien Han
7504cb16c6
docs: improve API contribution guidelines (#1137)
# What does this PR do?

Clarify when to update documentation, explain `uv sync --extra dev` and
OpenAPI generation, and specify where generated docs are stored.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-19 22:14:04 -08:00
Yuan Tang
25cdab5b28
docs: Remove unused python-openapi and json-strong-typing in openapi_generator (#1167)
This is no longer required to generated API reference after
5e7904ef6c
2025-02-19 22:06:29 -08:00
Botao Chen
2b995c22eb
feat: inference passthrough provider (#1166)
##  What does this PR do?
In this PR, we implement a passthrough inference provider that works for
any endpoints that respect llama stack inference API definition.

## Test Plan
config some endpoint that respect llama stack inference API definition
and got the inference results successfully

<img width="1268" alt="Screenshot 2025-02-19 at 8 52 51 PM"
src="https://github.com/user-attachments/assets/447816e4-ea7a-4365-b90c-386dc7dcf4a1"
/>
2025-02-19 21:47:00 -08:00
Ashwin Bharambe
d39f8de619 Pin sphinx 2025-02-19 20:20:46 -08:00
Ashwin Bharambe
89fdb2c9e9 Try a different css file API for sphinx 2025-02-19 20:14:40 -08:00
Botao Chen
b751f7003d
feat: add aggregation_functions to llm_as_judge_405b_simpleqa (#1164)
as title, to let scoring function llm_as_judge_405b_simpleqa output
aggregated_results.

We can leverage categorical_count to calculate the % of correctness as
eval benchmark metrics
2025-02-19 19:42:04 -08:00
Ihar Hrachyshka
c1f7d7f005
fix: miscellaneous job management improvements in torchtune (#1136)
- **refactor: simplify job status extraction a bit**
- **torchtune: save job status on schedule**
- **refactor: get rid of job_list in torchtune job management code**

# What does this PR do?

A failed job is now registered in API, and one can consult its status.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
$ llama-stack-client post_training status --job-uuid test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73                                                      
JobStatusResponse(checkpoints=[], job_uuid='test-jobe244b5b0-5053-4892-a4d9-d8fc8b116e73', status='failed', completed_at=None, resources_allocated=None, scheduled_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 3252), started_at=datetime.datetime(2025, 2, 18, 9, 4, 34, 10688))
```

[//]: # (## Documentation)

---------

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 19:09:37 -08:00
Francisco Arceo
7972daa72e
feat: Chunk sqlite-vec writes (#1094)
# What does this PR do?
1. This PR adds batch inserts into sqlite-vec as requested in
https://github.com/meta-llama/llama-stack/pull/1040
- Note: the inserts uses a uuid generated from the hash of the document
id and chunk content.
2. This PR also adds unit tests for sqlite-vec. In a follow up PR, I can
add similar tests to Faiss.

## Test Plan
1. Integration tests:
```python
INFERENCE_MODEL=llama3.2:3b-instruct-fp16 LLAMA_STACK_CONFIG=ollama pytest -s -v tests/client-sdk/vector_io/test_vector_io.py
...
PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_retrieve[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_list PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_register[all-MiniLM-L6-v2-sqlite_vec] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[faiss] PASSED
tests/client-sdk/vector_io/test_vector_io.py::test_vector_db_unregister[sqlite_vec] PASSED
```
3. Unit tests:
```python
pytest llama_stack/providers/tests/vector_io/test_sqlite_vec.py  -v -s --tb=short --disable-warnings --asyncio-mode=auto
...
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_add_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_query_chunks PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_chunk_id_conflict PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_register_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_unregister_vector_db PASSED
llama_stack/providers/tests/vector_io/test_sqlite_vec.py::test_generate_chunk_id PASSED
```
I also tested using the same example RAG script in
https://github.com/meta-llama/llama-stack/pull/1040 and received the
output.

---------

Signed-off-by: Francisco Javier Arceo <farceo@redhat.com>
2025-02-19 19:07:46 -08:00
Sébastien Han
26503ca1a4
docs: fix Python llama_stack_client SDK links (#1150)
# What does this PR do?
It seems that the llama_stack_client repo and the main repo were
originally the same, causing links to point to local references. We’ve
now updated them to use the correct llama_stack_client repo links.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-19 19:05:14 -08:00
Ashwin Bharambe
cdcbeb005b
chore: remove llama_models.llama3.api imports from providers (#1107)
There should be a choke-point for llama3.api imports -- this is the
prompt adapter. Creating a ChatFormat() object on demand is inexpensive.
The underlying Tokenizer is a singleton anyway.
2025-02-19 19:01:29 -08:00
Ben Browning
e9b8259cf9
fix: Get distro_codegen.py working with default deps and enabled in pre-commit hooks (#1123)
# What does this PR do?

Before this change, `distro_codegen.py` would only work if the user
manually installed multiple provider-specific dependencies (see #1122).
Now, users can run `distro_codegen.py` without any provider-specific
dependencies because we avoid importing the entire provider
implementations just to get the config needed to build the provider
template.

Concretely, this mostly means moving the
MODEL_ALIASES (and related variants) definitions to a new models.py
class within the provider implementation for those providers that
require additional dependencies. It also meant moving a couple of
imports from top-level imports to inside `get_adapter_impl` for some
providers, which follows the pattern used by multiple existing
providers.

To ensure we don't regress and accidentally add new imports that cause
distro_codegen.py to fail, the stubbed-in pre-commit hook for
distro_codegen.py was uncommented and slightly tweaked to run via `uv
run python ...` to ensure it runs with only the project's default
dependencies and to run automatically instead of manually.

Lastly, this updates distro_codegen.py itself to keep track of paths it
might have changed and to only `git diff` those specific paths when
checking for changed files instead of doing a diff on the entire working
tree. The latter was overly broad and would require a user have no other
unstaged changes in their working tree, even if those unstaged changes
were unrelated to generated code. Now it only flags uncommitted changes
for paths distro_codegen.py actually writes to.

Our generated code was also out-of-date, presumably because of these
issues, so this commit also has some updates to the generated code
purely because it was out of sync, and the pre-commit hook now enforces
things to be updated.

(Closes #1122)

## Test Plan

I manually tested distro_codegen.py and the pre-commit hook to verify
those work as expected, flagging any uncommited changes and catching any
imports that attempt to pull in provider-specific dependencies.

However, I do not have valid api keys to the impacted provider
implementations, and am unable to easily run the inference tests against
each changed provider. There are no functional changes to the provider
implementations here, but I'd appreciate a second set of eyes on the
changed import statements and moving of MODEL_ALIASES type code to a
separate models.py to ensure I didn't make any obvious errors.

---------

Signed-off-by: Ben Browning <bbrownin@redhat.com>
Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-19 18:39:20 -08:00
Alessandro Sangiorgi
9e03df983e
fix(rag-example): add provider_id to avoid llama_stack_client 400 error (#1114)
# What does this PR do?
Add provider_id to avoid errors using the rag example with
llama_stack_client

`llama_stack_client.BadRequestError: Error code: 400 - {'detail':
'Invalid value: No provider specified and multiple providers available.
Please specify a provider_id.'}`

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Co-authored-by: Xi Yan <yanxi970830@gmail.com>
2025-02-19 15:37:25 -08:00
Ashwin Bharambe
034ece0011 Ensure that deprecations for fields follow through to OpenAPI 2025-02-19 13:54:04 -08:00
Ashwin Bharambe
31a5ba5268 Add title to the json schemas 2025-02-19 13:26:39 -08:00
Ashwin Bharambe
5e7904ef6c Kill the older strong_typing code 2025-02-19 12:24:21 -08:00
Yuan Tang
a66b4c4c81
test: Enable test_text_chat_completion_with_tool_choice_required for remote::vllm (#1148) 2025-02-18 23:52:15 -05:00
ehhuang
8de7cf103b
feat: support tool_choice = {required, none, <function>} (#1059)
Summary:

titled


Test Plan:

added tests and

LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-02-18 23:25:15 -05:00
Xi Yan
37cf60b732
style: remove prints in codebase (#1146)
# What does this PR do?
- replace prints in codebase with logger
- update print_table to use rich Table

## Test Plan
- library client script in
https://github.com/meta-llama/llama-stack/pull/1145

```
llama stack list-providers
```
<img width="1407" alt="image"
src="https://github.com/user-attachments/assets/906b4f54-9e42-4e55-8968-7e3aa45525b2"
/>


[//]: # (## Documentation)
2025-02-18 19:41:37 -08:00
Xi Yan
e8cb9e0adb
fix: direct client pydantic type casting (#1145)
# What does this PR do?
- Closes #1142 
- Root cause is due to having `Union[str, AgenToolGroupWithArgs]`

## Test Plan
- Test with script described in issue. 

- Print out final converted pydantic object
<img width="1470" alt="image"
src="https://github.com/user-attachments/assets/15dc9cd0-f37a-4b91-905f-3fe4f59a08c6"
/>


[//]: # (## Documentation)
2025-02-18 16:07:54 -08:00
Xi Yan
8585b95a28 rename 2025-02-18 16:02:44 -08:00
Reid
4e76d312fa
fix: modify the model id title for model list (#1095)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Re-check and based on the doc, the download model id, actually is model
descriptor(also without `meta-llama/`).


https://llama-stack.readthedocs.io/en/latest/references/llama_cli_reference/index.html
```
$ llama download --source huggingface --model-id  Llama-Guard-3-1B:int4 --hf-token xxx  # model descriptor
Fetching 8 files:   0%|                                                                                                                   | 0/8 [00:00<?, ?it/s]
LICENSE.txt: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.71k/7.71k [00:00<00:00, 10.5MB/s]

$ llama download --source huggingface --model-id  Llama-Guard-3-1B-INT4 --hf-token xxxx  # hugging face repo without meta-llama/
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-Guard-3-1B-INT4 not found <<<<---


$ llama download --source meta --model-id Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL] [--max-parallel MAX_PARALLEL]
                      [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model Llama-3.2-3B-Instruct-SpinQuant_INT4_EO8 not found

$ llama download --source meta --model-id Llama3.2-3B-Instruct:int4-spinquant-eo8
Please provide the signed URL for model Llama3.2-3B-Instruct:int4-spinquant-eo8 you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...): ^CTraceback (most recent call last):

$ llama download --source meta --model-id meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8
usage: llama download [-h] [--source {meta,huggingface}] [--model-id MODEL_ID] [--hf-token HF_TOKEN] [--meta-url META_URL]
                      [--max-parallel MAX_PARALLEL] [--ignore-patterns IGNORE_PATTERNS] [--manifest-file MANIFEST_FILE]
llama download: error: Model meta-llama/Llama3.2-3B-Instruct:int4-spinquant-eo8 not found
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:26:41 -08:00
Reid
d9f5beb15a
style: update download help text (#1135)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the cade:

6b1773d530/llama_stack/cli/download.py (L454)
and the test, it can use comma to specify multiple model ids. So update
the usage.
```
$ llama model download --source meta --model-id Llama3.2-1B,Llama3.2-3B
Please provide the signed URL for model Llama3.2-1B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):

Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B

[Optionally] To run MD5 checksums, use the following command: llama model verify-download --model-id Llama3.2-1B
Please provide the signed URL for model Llama3.2-3B you received via email after visiting https://www.llama.com/llama-downloads/ (e.g., https://llama3-1.llamameta.net/*?Policy...):
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  6.4/6.4 GB      -  0:00:00

Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-3B


$ llama model download --source huggingface --model-id Llama3.2-1B,Llama3.2-3B
original%2Fparams.json: 100%|██████████████████████████████████████████████████████████| 220/220 [00:00<00:00, 564kB/
Successfully downloaded model to /Users/xx/.llama/checkpoints/Llama3.2-1B
...
tokenizer.json: 100%|█████████████████████████████████████████████████████████████| 9.09M/9.09M [00:00<00:00, 9.18MB/s]
Successfully downloaded model to /Users/xxx/.llama/checkpoints/Llama3.2-3B


before:
$ llama model download --help
 --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models

after:
$ llama model download --help
  --model-id MODEL_ID   See `llama model list` or `llama model list --show-all` for the list of available models. Specify multiple model IDs with commas, e.g. --model-id Llama3.2-1B,Llama3.2-3B
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:24:31 -08:00
Reid
92aefec191
style: update verify-download help text (#1134)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Based on the code
6b1773d530/llama_stack/cli/download.py (L379)
and test, `verify-download` should only use in `downloaded from Meta`.

```
test: no checklist.chk  file for hf download
$ llama model download --source meta --model-id Llama3.2-1B
Downloading checklist.chk            ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  156/156 bytes   -  0:00:00
Downloading tokenizer.model          ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.2/2.2 MB      -  0:00:00
Downloading params.json              ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  220/220 bytes   -  0:00:00
Downloading consolidated.00.pth      ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━        100.0%  2.5/2.5 GB      -  0:00:00



before:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID

Verify the downloaded checkpoints' checksums

options:
  -h, --help           show this help message and exit
  --model-id MODEL_ID  Model ID to verify


after:
$ llama model verify-download --help
usage: llama model verify-download [-h] --model-id MODEL_ID

Verify the downloaded checkpoints' checksums for models downloaded from Meta

options:
  -h, --help           show this help message and exit
  --model-id MODEL_ID  Model ID to verify (only for models downloaded from Meta)
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-18 10:15:26 -08:00