Commit graph

1235 commits

Author SHA1 Message Date
Xi Yan
22355e3b1f add back 2/n 2025-02-20 17:53:29 -08:00
Xi Yan
157cf320d9 add back 2/n 2025-02-20 17:52:01 -08:00
Xi Yan
ee3c174bb3 add back 2/n 2025-02-20 17:40:39 -08:00
Xi Yan
cd36a77e20 3/n 2025-02-20 17:38:21 -08:00
Xi Yan
01f90dfe0c Merge branch 'agents-unify-tools-2' into agents-unify-tools-3 2025-02-20 17:27:27 -08:00
Xi Yan
7677f01beb Merge branch 'agents-unify-tools' into agents-unify-tools-2 2025-02-20 17:27:11 -08:00
Xi Yan
c7e84253e7 Merge branch 'agents-unify-tools' into agents-unify-tools-3 2025-02-20 17:26:58 -08:00
Xi Yan
8fe38d128d streaming flag 2025-02-20 16:58:45 -08:00
Xi Yan
5fbb159cf6 fix test 2025-02-20 16:48:17 -08:00
Xi Yan
96c521ada6 temp debug 2025-02-20 16:30:27 -08:00
Xi Yan
fb0d992f99 temp debuug 2025-02-20 15:48:55 -08:00
Xi Yan
4dbe3fd9e6 Merge branch 'agents-unify-tools' into agents-unify-tools-2 2025-02-20 15:29:11 -08:00
Xi Yan
5644d10c82 remove usermessages 2025-02-20 15:26:37 -08:00
Xi Yan
beea9ac133 Merge branch 'agents-unify-tools' into agents-unify-tools-2 2025-02-20 15:07:50 -08:00
Xi Yan
afee71604f api 2025-02-20 15:07:18 -08:00
Xi Yan
07c9222b6f debug 2025-02-20 14:54:45 -08:00
Xi Yan
5eea2bc44d Merge branch 'agents-unify-tools' into agents-unify-tools-2 2025-02-20 14:41:47 -08:00
Xi Yan
57ca2c6365 Merge branch 'main' into agents-unify-tools 2025-02-20 14:41:29 -08:00
Ashwin Bharambe
736560ceba Remove os.getenv() from ollama config 2025-02-20 14:30:32 -08:00
Xi Yan
7b0ff5718e Merge branch 'agents-unify-tools' into agents-unify-tools-2 2025-02-20 14:19:52 -08:00
Xi Yan
9c9a607b41 merge 2025-02-20 14:17:31 -08:00
LESSuseLESS
2cbe9395b0
feat: D69478008 [llama-stack] turning tests into data-driven (#1180)
# What does this PR do?

We have several places running tests for different purposes.
- oss llama stack
  - provider tests
  - e2e tests
- provider llama stack
  - unit tests
  - e2e tests

It would be nice if they can *share the same set of test data*, so we
maintain the consistency between spec and implementation. This is what
this diff is about, isolating test data from test coding, so that we can
reuse the same data at different places by writing different test
coding.

## Test Plan

== Set up Ollama local server  
==  Run a provider test
conda activate stack

OLLAMA_URL="http://localhost:8321" \
pytest -v -s -k "ollama" --inference-model="llama3.2:3b-instruct-fp16" \

llama_stack/providers/tests/inference/test_text_inference.py::TestInference::test_completion_structured_output
// test_structured_output should also work

== Run an e2e test
conda activate sherpa
with-proxy pip install llama-stack
export INFERENCE_MODEL=llama3.2:3b-instruct-fp16
export LLAMA_STACK_PORT=8322
with-proxy llama stack build --template ollama
with-proxy llama stack run --env OLLAMA_URL=http://localhost:8321 ollama
  - Run test client,
LLAMA_STACK_PORT=8322 LLAMA_STACK_BASE_URL="http://localhost:8322" \
pytest -v -s --inference-model="llama3.2:3b-instruct-fp16" \

tests/client-sdk/inference/test_text_inference.py::test_text_completion_structured_output
// test_text_chat_completion_structured_output should also work

## Notes

- This PR was automatically generated by oss_sync
- Please refer to D69478008 for more details.
2025-02-20 14:13:06 -08:00
ehhuang
1166afdf76
fix: some telemetry APIs don't currently work (#1188)
Summary:

This bug is surfaced by using the http LS client. The issue is that
non-scalar values in 'GET' method are `body` params in fastAPI, but our
spec generation script doesn't respect that. We fix by just making them
POST method instead.

Test Plan:
Test API call with newly sync'd client
(https://github.com/meta-llama/llama-stack-client-python/pull/149)

<img width="1114" alt="image"
src="https://github.com/user-attachments/assets/7710aca5-d163-4e00-a465-14e6fcaac2b2"
/>
2025-02-20 14:09:25 -08:00
Xi Yan
ea1faae50e
chore!: deprecate eval/tasks (#1186)
# What does this PR do?
- Fully deprecate eval/tasks

[//]: # (If resolving an issue, uncomment and update the line below)
Closes #1088 

NOTE: this will be a breaking change. We have introduced the new API in
0.1.3 .

Notebook has been updated to use the new endpoints.

## Test Plan
```
pytest -v -s --nbval-lax ./docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb 
```
<img width="611" alt="image"
src="https://github.com/user-attachments/assets/79f6efe1-81ba-494e-bf36-1fc0c2b9bc6f"
/>



cc @SLR722  for awareness

[//]: # (## Documentation)
2025-02-20 14:06:21 -08:00
Xi Yan
7676756778 merge 2025-02-20 14:04:25 -08:00
Ashwin Bharambe
07ccf908f7 ModelAlias -> ProviderModelEntry 2025-02-20 14:02:36 -08:00
Xi Yan
a44d230676 rename 2025-02-20 14:02:17 -08:00
Xi Yan
ff87677102 rename 2025-02-20 14:00:08 -08:00
Xi Yan
82109749ea rename 2025-02-20 13:56:47 -08:00
Kevin Cogan
561295af76
docs: Fix Links, Add Podman Instructions, Vector DB Unregister, and Example Script (#1129)
# What does this PR do?
This PR improves the documentation in several ways:

- **Fixed incorrect link in `tools.md`** to ensure all references point
to the correct resources.
- **Added instructions for running the `code-interpreter` agent in a
Podman container**, helping users configure and execute the tool in
containerized environments.
- **Introduced an unregister command for single and multiple vector
databases**, making it easier to manage vector DBs.
- **Provided a simple example script for using the `code-interpreter`
agent**, giving users a practical reference for implementation.

These updates enhance the clarity, usability, and completeness of the
documentation.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
The following steps were performed to verify the accuracy of the
changes:

1. **Validated all fixed link** by checking their destinations to ensure
correctness.
2. **Ran the `code-interpreter` agent in a Podman container** following
the new instructions to confirm functionality.
3. **Executed the vector database unregister commands** and verified
that both single and multiple databases were correctly removed.
4. **Tested the new example script for `code-interpreter`**, ensuring it
runs without errors.

All changes were reviewed and tested successfully, improving the
documentation's accuracy and ease of use.

[//]: # (## Documentation)
2025-02-20 13:52:14 -08:00
Xi Yan
7a111e39f6 Merge branch 'main' into agents-unify-tools 2025-02-20 13:51:32 -08:00
Vladimir Ivić
f7161611c6
feat: adding endpoints for files and uploads (#1070)
Summary:
Adds spec definitions for file uploads operations.

This API focuses around two high level operations:
* Initiating and managing upload session
* Accessing uploaded file information

Usage examples:

To start a file upload session:
```
curl -X POST https://localhost:8321/v1/files \
-d '{
   "key": "image123.jpg',
   "bucket": "images",
   "mime_type": "image/jpg",
   "size": 12345
}'

# Returns
{
  “id”: <session_id>
  “url”: “https://localhost:8321/v1/files/session:<session_id>”,
  "offset": 0,
  "size": 12345
}

```

To upload file content to an existing session
```
curl -i -X POST "https://localhost:8321/v1/files/session:<session_id> \
  --data-binary @<path_to_local_file>

# Returns
{
  "key": "image123.jpg",
  "bucket": "images",
  "mime_type": "image/jpg",
  "bytes": 12345,
  "created_at": 1737492240
}

# Implementing on server side (Flask example for simplicity):
@app.route('/uploads/{upload_id}', methods=['POST'])
def upload_content_to_session(upload_id):
    try:
        # Get the binary file data from the request body
        file_data = request.data

        # Save the file to disk
        save_path = f"./uploads/{upload_id}"
        with open(save_path, 'wb') as f:
            f.write(file_data)
        return {__uploaded_file_json__}, 200
    except Exception as e:
        return 500

```

To read information about an existing upload session
```
curl -i -X GET "https://localhost:8321/v1/files/session:<session_id>

# Returns
{
  “id”: <session_id>
  “url”: “https://localhost:8321/v1/files/session:<session_id>”,
  "offset": 1024,
  "size": 12345
}
```

To list buckets
```
GET /files

# Returns
{
  "data": [
     {"name": "bucket1"},
     {"name": "bucket2"},
   ]
}
```

To list all files in a bucket
```
GET /files/{bucket}

# Returns
{
  "data": [
    {
      "key": "shiba.jpg",
      "bucket": "dogs",
      "mime_type": "image/jpg",
      "bytes": 82334,
      "created_at": 1737492240,
    },
    {
      "key": "persian_cat.jpg",
      "mime_type": "image/jpg",
      "bucket": "cats",
      "bytes": 39924,
      "created_at": 1727493440,
    },
  ]
}
```

To get specific file info
```
GET /files/{bucket}/{key}

{
  "key": "shiba.jpg",
  "bucket": "dogs",
  "mime_type": "image/jpg",
  "bytes": 82334,
  "created_at": 1737492240,
}

```

To delete specific file
```
DELETE /files/{bucket}/{key}

{
  "key": "shiba.jpg",
  "bucket": "dogs",
  "mime_type": "image/jpg",
  "bytes": 82334,
  "created_at": 1737492240,
}

```
2025-02-20 13:09:00 -08:00
Xi Yan
7dae81cb68 tmp 2025-02-20 12:57:18 -08:00
Ashwin Bharambe
eddef0b2ae
chore: slight renaming of model alias stuff (#1181)
Quick test by running:
```
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk
```
2025-02-20 11:48:46 -08:00
Ashwin Bharambe
2eda050aef Fix ollama fixture 2025-02-20 11:46:02 -08:00
Ashwin Bharambe
3d891fc9ba ModelAlias cleanup 2025-02-20 11:44:39 -08:00
Xi Yan
6b6feebc72 update types 2025-02-20 10:53:36 -08:00
Xi Yan
dc1406c25a dummy impl 2025-02-20 10:51:08 -08:00
Xi Yan
e0fd19531b openapi gen 2025-02-20 10:41:08 -08:00
Xi Yan
7689ff2b54 naming submit_tool_response_messages 2025-02-20 10:36:14 -08:00
Xi Yan
5d97ee645c api update 2025-02-20 09:52:37 -08:00
Ben Browning
fbec826883
docs: Add note about distro_codegen.py and provider dependencies (#1175)
# What does this PR do?

This expands upon the existing distro_codegen.py text in the new API
provider documentation to include a note about not including
provider-specific dependencies in the code path that builds the
distribution's template.

Our distro_codegen pre-commit hook will catch this case anyway, but this
attempts to inform provider authors ahead of time about that.

## Test Plan

I built the docs website locally via the following:
```
pip install docs/requirements.txt
sphinx-build -M html docs/source docs_output
```
Then, I opened that newly generated
`docs_output/html/contributing/new_api_provider.html` in my browser and
confirmed everything rendered correctly.

Signed-off-by: Ben Browning <bbrownin@redhat.com>
2025-02-20 09:23:46 -08:00
Ashwin Bharambe
984a8039ad Kill unnecessary check on --safety-shield test param 2025-02-20 09:15:23 -08:00
Rashmi Pawar
996f27a308
fix: add logging import (#1174)
# What does this PR do?
Fixes logging import and the logger instance creation

cc: @dglogo
2025-02-20 11:26:47 -05:00
Ihar Hrachyshka
fb6a3efb1d
feat: Enable CPU training for torchtune (#1140)
# What does this PR do?

You are now able to run a training cycle on CPU. This is useful for
debugging and testing purposes.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

On a Mac machine without CUDA devices:

```
17:00:24.417 [START] /v1/post-training/supervised-fine-tune
DEBUG 2025-02-18 12:00:24,419 torchtune.utils._logging:60: Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0
INFO 2025-02-18 12:00:24,463 torchtune.utils._logging:64: Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
INFO 2025-02-18 12:00:46,699 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:182: Model is initialized with precision torch.bfloat16.
INFO 2025-02-18 12:00:46,784 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:185: Tokenizer is initialized.
INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:188: Optimizer is initialized.
INFO 2025-02-18 12:00:46,786 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:192: Loss is initialized.
INFO 2025-02-18 12:00:48,997 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:209: Dataset and Sampler are initialized.
INFO 2025-02-18 12:00:48,998 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:227: Learning rate scheduler is initialized.
Writing logs to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/log_1739898049.txt
1|1|Loss: 1.7414989471435547: 100% 1/1 [03:46<00:00, 226.21s/it]INFO 2025-02-18 12:04:35,227 llama_stack.providers.inline.post_training.torchtune.recipes.lora_finetuning_single_device:528: Starting checkpoint save...
INFO 2025-02-18 12:04:49,974 torchtune.utils._logging:121: Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
INFO 2025-02-18 12:04:49,981 torchtune.utils._logging:132: Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
model_file_path /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0
1|1|Loss: 1.7414989471435547: 100% 1/1 [04:01<00:00, 241.18s/it]
INFO:     ::1:64990 - "POST /v1/post-training/supervised-fine-tune HTTP/1.1" 200 OK
17:04:50.364 [END] /v1/post-training/supervised-fine-tune [StatusCode.OK] (265947.01ms)
 17:00:24.419 [DEBUG] Setting manual seed to local seed 3268931494. Local seed is seed + rank = 3268931494 + 0
 17:00:24.463 [INFO] Identified model_type = Llama3_2. Ignoring output.weight in checkpoint in favor of the tok_embedding.weight tied weights.
 17:00:46.700 [INFO] Model is initialized with precision torch.bfloat16.
 17:00:46.784 [INFO] Tokenizer is initialized.
 17:00:46.786 [INFO] Optimizer is initialized.
 17:00:46.786 [INFO] Loss is initialized.
 17:00:48.997 [INFO] Dataset and Sampler are initialized.
 17:00:48.998 [INFO] Learning rate scheduler is initialized.
 17:04:35.227 [INFO] Starting checkpoint save...
 17:04:49.974 [INFO] Model checkpoint of size 6.43 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/consolidated.00.pth
 17:04:49.981 [INFO] Adapter checkpoint of size 0.00 GB saved to /Users/ihrachys/.llama/checkpoints/meta-llama/Llama-3.2-3B-Instruct-sft-0/adapter/adapter.pth
```

[//]: # (## Documentation)

Signed-off-by: Ihar Hrachyshka <ihar.hrachyshka@gmail.com>
2025-02-19 22:42:58 -08:00
Xi Yan
a324ceb9a9 precommit again 2025-02-19 22:40:45 -08:00
Sébastien Han
4694780d23
test: skip model registration for unsupported providers (#1030)
# What does this PR do?
- Updated `test_register_with_llama_model` to skip tests when using the
Ollama provider, as it does not support custom model names.
- Delete `test_initialize_model_during_registering` since there is no
  "load_model" semantic that is exposed publicly on a provider.

These changes ensure that tests do not fail for providers with
incompatible behaviors.

Signed-off-by: Sébastien Han <seb@redhat.com>

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Run Ollama:

```
 uv run pytest -v -s -k "ollama" llama_stack/providers/tests/inference/test_model_registration.py
/Users/leseb/Documents/AI/llama-stack/.venv/lib/python3.13/site-packages/pytest_asyncio/plugin.py:207: PytestDeprecationWarning: The configuration option "asyncio_default_fixture_loop_scope" is unset.
The event loop scope for asynchronous fixtures will default to the fixture caching scope. Future versions of pytest-asyncio will default the loop scope for asynchronous fixtures to function scope. Set the default fixture loop scope explicitly in order to avoid unexpected behavior in the future. Valid fixture loop scopes are: "function", "class", "module", "package", "session"

  warnings.warn(PytestDeprecationWarning(_DEFAULT_FIXTURE_LOOP_SCOPE_UNSET))
========================================== test session starts ==========================================
platform darwin -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0 -- /Users/leseb/Documents/AI/llama-stack/.venv/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.13.1', 'Platform': 'macOS-15.3-arm64-arm-64bit-Mach-O', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'html': '4.1.1', 'metadata': '3.1.1', 'asyncio': '0.25.3', 'anyio': '4.8.0', 'nbval': '0.11.0'}}
rootdir: /Users/leseb/Documents/AI/llama-stack
configfile: pyproject.toml
plugins: html-4.1.1, metadata-3.1.1, asyncio-0.25.3, anyio-4.8.0, nbval-0.11.0
asyncio: mode=Mode.STRICT, asyncio_default_fixture_loop_scope=None
collected 65 items / 60 deselected / 5 selected                                                         

llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_unsupported_model[-ollama] PASSED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_nonexistent_model[-ollama] PASSED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_llama_model[-ollama] SKIPPED
llama_stack/providers/tests/inference/test_model_registration.py::TestModelRegistration::test_register_with_invalid_llama_model[-ollama] PASSED

======================== 3 passed, 1 skipped, 60 deselected, 2 warnings in 0.22s ========================
```


[//]: # (## Documentation)
[//]: # (- [ ] Added a Changelog entry if the change is significant)

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-19 22:39:13 -08:00
Sixian Yi
531940aea9
script for running client sdk tests (#895)
# What does this PR do?
Create a script for running all client-sdk tests on Async Library
client, with the option to generate report


## Test Plan

```
python llama_stack/scripts/run_client_sdk_tests.py --templates together fireworks --report
```



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
2025-02-19 22:38:06 -08:00
Xi Yan
a3d8c49459 precommit 2025-02-19 22:37:41 -08:00
Xi Yan
ce040ad111 precommit 2025-02-19 22:35:24 -08:00