Commit graph

1355 commits

Author SHA1 Message Date
Ashwin Bharambe
46b0a404e8
chore: remove straggler references to llama-models (#1345)
Straggler references cleanup
2025-03-01 14:26:03 -08:00
Ashwin Bharambe
8bbd52bb9f
chore: remove dependency on llama_models completely (#1344) 2025-03-01 12:48:08 -08:00
Reid
7131d5ddeb
chore: remove start_venv.sh (#1341)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

`start_venv.sh` lifecycle should be:


025f615868
>>
34e3faa4e8
>>
4684fd3f8d

Finally replaced by `start_stack.sh`

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-01 11:22:06 -08:00
Ashwin Bharambe
6609d4ada4
feat: allow conditionally enabling providers in run.yaml (#1321)
# What does this PR do?

We want to bundle a bunch of (typically remote) providers in a distro
template and be able to configure them "on the fly" via environment
variables. So far, we have been able to do this with simple env var
replacements. However, sometimes you want to only conditionally enable
providers (because the relevant remote services may not be alive, or
relevant.) This was not possible until now.

To aid this, we add a simple (bash-like) env var replacement
enhancement: `${env.FOO+bar}` evaluates to `bar` if the variable is SET
and evaluates to empty string if it is not. On top of that, we update
our main resolver to ignore any provider whose ID is null.

This allows using the distro like this:

```bash
llama stack run dev --env CHROMADB_URL=http://localhost:6001 --env ENABLE_CHROMADB=1
```

when only Chroma is UP. This disables the other `pgvector` provider in
the run configuration.


## Test Plan

Hard code `chromadb` as the vector io provider inside
`test_vector_io.py` and run:

```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/vector_io/ --embedding-model all-MiniLM-L6-v2
```
2025-03-01 11:19:14 -08:00
ehhuang
81c6ef5c1c
fix: don't update tool_config inplace (#1338)
Summary:

messes tests up

Test Plan:
run agent tests
2025-03-01 10:40:00 -08:00
Reid
327b17e5f0
chore: add container cmd check in start_stack.sh (#1340)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-01 10:39:32 -08:00
ehhuang
7cff9f504f
fix: raise error when request param failed to convert (#1339)
# Summary:
This led to extremely hard to debug messages.
Before:

llama_stack/distribution/library_client.py:275: in request
    response = await self._call_non_streaming(
llama_stack/distribution/library_client.py:322: in _call_non_streaming
    result = await matched_func(**body)
llama_stack/providers/utils/telemetry/trace_protocol.py:102: in
async_wrapper
    result = await method(self, *args, **kwargs)
llama_stack/providers/inline/agents/meta_reference/agents.py:80: in
create_agent
    value=agent_config.model_dump_json(),
E   AttributeError: 'dict' object has no attribute 'model_dump_json'

After:

E ValueError: Failed to convert parameter {'model':
'meta-llama/Llama-3.1-8B-Instruct', 'instructions': 'You are a helpful
assistant', 'sampling_params': {'strategy': {'type': 'top_p',
'temperature': 0.0001, 'top_p': 0.9}}, 'toolgroups': [{'name':
'builtin::rag'}], 'input_shields': ['meta-llama/Llama-Guard-3-8B'],
'output_shields': ['meta-llama/Llama-Guard-3-8B'],
'enable_session_persistence': False} into <class
'llama_stack.apis.agents.agents.AgentConfig'>: 2 validation errors for
AgentConfig
E   toolgroups.0.str
E Input should be a valid string [type=string_type, input_value={'name':
'builtin::rag'}, input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/string_type
E   toolgroups.0.AgentToolGroupWithArgs.args
E Field required [type=missing, input_value={'name': 'builtin::rag'},
input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/missing

# Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
2025-03-01 10:39:05 -08:00
Reid
dc069025f5
chore: fix typo (#1343)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]


21ec67356c/distributions

It should missed the `s`.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-03-01 10:36:04 -08:00
ehhuang
21ec67356c
fix: RAG with documents (#1337)
Summary:
This was broken by
https://github.com/meta-llama/llama-stack/pull/1015/files#r1975394190

Test Plan:

added e2e test
2025-02-28 16:51:00 -08:00
ehhuang
7854af8b52
docs: update user prompt example (#1329)
Summary:
in case user sets it to a small model with poor tool use capability

Test Plan:
copy and paste to notebook and ran
2025-02-28 16:42:29 -08:00
ehhuang
ba3bedc7e9
test: remove old test (#1334)
Summary:

This test is no longer relevant. We updated the default system prompt in
https://github.com/meta-llama/llama-stack/pull/1310, and system override
behavior is already unit-tested in test_prompt_adapter.py

Test Plan:
read
2025-02-28 16:42:13 -08:00
ehhuang
2faee24873
chore: better raise (#1335)
Summary:
addresses
https://github.com/meta-llama/llama-stack/pull/1282#discussion_r1972546802

Test Plan:
2025-02-28 16:41:20 -08:00
Ashwin Bharambe
7ad7e3b970 fix: only install llama-stack package, deps are now correctly incorporated 2025-02-28 16:12:11 -08:00
Surya Prakash Pathak
9b6a2577b1
docs: Update llama-stack version in README.md (#1330)
# What does this PR do?
This PR updates the version in the
[README.md](https://github.com/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/README.md)
to reflect the latest changes in Llama Stack setup.

Previously, using **llama-stack==0.1.0** caused an error when running:
```bash
llama stack build --template ollama --image-type conda
```
Upgrading to llama-stack==0.1.3 resolves this issue.

## Test Plan
- Verified that `llama stack build --template ollama --image-type conda`
works correctly.

---------

Signed-off-by: Surya Prakash Pathak <supathak@redhat.com>
2025-02-28 13:37:03 -08:00
Xi Yan
82fa0803fa
chore: refactor client tool in test (#1331)
# What does this PR do?

Use @client_tool decorator instead of ClientTool

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/client-sdk/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```

<img width="1053" alt="image"
src="https://github.com/user-attachments/assets/d3ade884-ef42-494e-8028-3b09d9ef1978"
/>


[//]: # (## Documentation)
2025-02-28 12:29:50 -08:00
Xi Yan
15f69e75ff
fix: replace eval with json decoding for format_adapter (#1328)
# What does this PR do?
- using `eval` is a security risk

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

- see https://github.com/meta-llama/llama-stack/pull/1327

cc @SLR722 we will need to update the corresponding dataset via

```python
def update_to_json_str():
        
dataset = datasets.load_dataset(...)
processed_dataset = dataset[split].map(
        lambda x: {
                "column": json.dumps(eval(x["column"]))
       }
)
processed_dataset.push_to_hub(...)
```
[//]: # (## Documentation)
2025-02-28 11:25:23 -08:00
Ashwin Bharambe
5547ef953c
feat: enhance OpenAPI spec to include Error types (#1320)
# What does this PR do?

An API spec must talk about Error handling. This was a pretty glaring
omission so far. This PR begins to address it by adding a set of
standard error responses we can attach to all our API calls.

At a future point, we can add specific error types where necessary
(although we should not hurry to do that; it is best done very late.)

## Test Plan

Checked that Stainless SDK generation succeeds.
2025-02-28 11:16:12 -08:00
Xi Yan
6520baebed
fix: replace eval with json decoding (#1327)
# What does this PR do?

- Using `eval` on server is a security risk
- Replace `eval` with `json.loads`

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
```
pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb 
```
<img width="747" alt="image"
src="https://github.com/user-attachments/assets/7aff3d95-0b12-4394-b9d0-aeff791eee38"
/>


[//]: # (## Documentation)
2025-02-28 11:10:45 -08:00
Reid
66cd128ab5
docs: update the downloaded list doc (#1266)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

Since released the `--downloaded` option, so update the related
documents.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-28 10:10:12 -08:00
Reid
14c442f177
chore: update cmd check (#1293)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-28 10:08:05 -08:00
Reid
ea4f13cc20
chore: add container cmd check (#1306)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-28 10:07:24 -08:00
Reid
5366dab31e
docs: update build doc (#1262)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]


55eb257459/llama_stack/cli/stack/run.py (L22)


55eb257459/llama_stack/cli/stack/_build.py (L103)

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-28 10:03:45 -08:00
Matthew Farrellee
83dc8fbdff
test: cleanup embedding model test suite (#1322)
# What does this PR do?

 - skip media tests for models that do not support media
 - skip output_dimension tests for models that do not support it
 - skip task_type tests for models that do not support it
 - provide task_type for models that require it

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model ...`
2025-02-28 10:02:36 -08:00
Sébastien Han
c91548fe07
build(container): misc improvements (#1291)
# What does this PR do?

See individual commit messages.

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

Apply this diff:

```
diff --git a/llama_stack/templates/ollama/build.yaml b/llama_stack/templates/ollama/build.yaml
index da33b8d5..4a702f6f 100644
--- a/llama_stack/templates/ollama/build.yaml
+++ b/llama_stack/templates/ollama/build.yaml
@@ -28,5 +28,5 @@ distribution_spec:
     - remote::tavily-search
     - inline::code-interpreter
     - inline::rag-runtime
-    - remote::model-context-protocol
+  container_image: "registry.access.redhat.com/ubi9"
 image_type: conda
```

Then run:

```
CONTAINER_BINARY=podman llama stack build --template ollama --image-type container --image-name registry.access.redhat.com/ubi9
Containerfile created successfully in /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile

FROM registry.access.redhat.com/ubi9
WORKDIR /app

RUN dnf -y update && dnf install -y iputils net-tools wget     vim-minimal python3.11 python3.11-pip python3.11-wheel     python3.11-setuptools && ln -s /bin/pip3.11 /bin/pip && ln -s /bin/python3.11 /bin/python && dnf clean all

ENV UV_SYSTEM_PYTHON=1
RUN pip install uv
RUN uv pip install --no-cache ollama nltk opentelemetry-sdk aiosqlite matplotlib datasets sqlite-vec scipy chromadb-client psycopg2-binary numpy scikit-learn openai redis pandas tqdm blobfile sentencepiece aiohttp requests pillow pymongo transformers autoevals opentelemetry-exporter-otlp-proto-http pypdf chardet aiosqlite fastapi fire httpx uvicorn
RUN uv pip install --no-cache llama-stack
RUN pip uninstall -y uv
ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "ollama"]

# Allows running as non-root user
RUN mkdir -p /.llama /.cache

RUN chmod -R g+rw /app /.llama /.cache

PWD: /Users/leseb/Documents/AI/llama-stack
Containerfile: /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile
+ podman build --platform linux/arm64 -t distribution-ollama:0.1.4 -f /var/folders/mq/rnm5w_7s2d3fxmtkx02knvhm0000gn/T/tmp.I7E5V6zbVI/Containerfile . --progress=plain
STEP 1/11: FROM registry.access.redhat.com/ubi9
STEP 2/11: WORKDIR /app
--> Using cache d73dafd4caddd75bc29242a9031258fea759dc571c5bb53a64b5e6d86b3b1335
--> d73dafd4cadd
STEP 3/11: RUN dnf -y update && dnf install -y iputils net-tools wget     vim-minimal python3.11 python3.11-pip python3.11-wheel     python3.11-setuptools && ln -s /bin/pip3.11 /bin/pip && ln -s /bin/python3.11 /bin/python && dnf clean all
--> Using cache b74ad682db149771612a3ea1e4796e0760ab8a4e07c26ad672b46a86d38178c2
--> b74ad682db14
STEP 4/11: ENV UV_SYSTEM_PYTHON=1
--> Using cache 0812a05e6576506aa2fe646cbf239d0cb504cac30a50cb5cf4dc88e49039466d
--> 0812a05e6576
STEP 5/11: RUN pip install uv
--> Using cache a0ce1705f87e52f70f6eb34e66f67b68ebc7c1a073f4d2a664b189cfa89a4e88
--> a0ce1705f87e
STEP 6/11: RUN uv pip install --no-cache ollama nltk opentelemetry-sdk aiosqlite matplotlib datasets sqlite-vec scipy chromadb-client psycopg2-binary numpy scikit-learn openai redis pandas tqdm blobfile sentencepiece aiohttp requests pillow pymongo transformers autoevals opentelemetry-exporter-otlp-proto-http pypdf chardet aiosqlite fastapi fire httpx uvicorn
Using Python 3.11.9 environment at: /usr
Resolved 107 packages in 1.78s
Downloading kiwisolver (1.4MiB)
Downloading aiohttp (1.6MiB)
Downloading grpcio (5.4MiB)
Downloading nltk (1.4MiB)
Downloading transformers (9.5MiB)
Downloading pydantic-core (1.7MiB)
Downloading lxml (4.6MiB)
Downloading psycopg2-binary (2.7MiB)
Downloading scipy (33.8MiB)
Downloading scikit-learn (12.0MiB)
Downloading tokenizers (2.8MiB)
Downloading fonttools (4.6MiB)
Downloading pymongo (1.3MiB)
Downloading rapidfuzz (1.4MiB)
Downloading sentencepiece (1.2MiB)
Downloading pyarrow (38.7MiB)
Downloading matplotlib (8.1MiB)
Downloading pycryptodomex (2.1MiB)
Downloading pillow (4.2MiB)
Downloading pandas (14.9MiB)
Downloading numpy (13.6MiB)
   Building fire==0.7.0
 Downloaded sentencepiece
 Downloaded kiwisolver
 Downloaded pymongo
 Downloaded rapidfuzz
 Downloaded nltk
 Downloaded aiohttp
      Built fire==0.7.0
 Downloaded pydantic-core
 Downloaded pycryptodomex
 Downloaded psycopg2-binary
 Downloaded tokenizers
 Downloaded pillow
 Downloaded lxml
 Downloaded fonttools
 Downloaded grpcio
 Downloaded matplotlib
 Downloaded transformers
 Downloaded scikit-learn
 Downloaded numpy
 Downloaded pandas
 Downloaded scipy
 Downloaded pyarrow
Prepared 107 packages in 3.03s
Installed 107 packages in 62ms
 + aiohappyeyeballs==2.4.6
 + aiohttp==3.11.13
 + aiosignal==1.3.2
 + aiosqlite==0.21.0
 + annotated-types==0.7.0
 + anyio==4.8.0
 + attrs==25.1.0
 + autoevals==0.0.120
 + backoff==2.2.1
 + blobfile==3.0.0
 + braintrust-core==0.0.58
 + certifi==2025.1.31
 + chardet==5.2.0
 + charset-normalizer==3.4.1
 + chevron==0.14.0
 + chromadb-client==0.6.3
 + click==8.1.8
 + contourpy==1.3.1
 + cycler==0.12.1
 + datasets==3.3.2
 + deprecated==1.2.18
 + dill==0.3.8
 + distro==1.9.0
 + dnspython==2.7.0
 + fastapi==0.115.8
 + filelock==3.17.0
 + fire==0.7.0
 + fonttools==4.56.0
 + frozenlist==1.5.0
 + fsspec==2024.12.0
 + googleapis-common-protos==1.68.0
 + grpcio==1.70.0
 + h11==0.14.0
 + httpcore==1.0.7
 + httpx==0.28.1
 + huggingface-hub==0.29.1
 + idna==3.10
 + importlib-metadata==8.5.0
 + jiter==0.8.2
 + joblib==1.4.2
 + jsonschema==4.23.0
 + jsonschema-specifications==2024.10.1
 + kiwisolver==1.4.8
 + levenshtein==0.26.1
 + lxml==5.3.1
 + matplotlib==3.10.0
 + monotonic==1.6
 + multidict==6.1.0
 + multiprocess==0.70.16
 + nltk==3.9.1
 + numpy==1.26.4
 + ollama==0.4.7
 + openai==1.64.0
 + opentelemetry-api==1.30.0
 + opentelemetry-exporter-otlp-proto-common==1.30.0
 + opentelemetry-exporter-otlp-proto-grpc==1.30.0
 + opentelemetry-exporter-otlp-proto-http==1.30.0
 + opentelemetry-proto==1.30.0
 + opentelemetry-sdk==1.30.0
 + opentelemetry-semantic-conventions==0.51b0
 + orjson==3.10.15
 + overrides==7.7.0
 + packaging==24.2
 + pandas==2.2.3
 + pillow==11.1.0
 + posthog==3.16.0
 + propcache==0.3.0
 + protobuf==5.29.3
 + psycopg2-binary==2.9.10
 + pyarrow==19.0.1
 + pycryptodomex==3.21.0
 + pydantic==2.10.6
 + pydantic-core==2.27.2
 + pymongo==4.11.1
 + pyparsing==3.2.1
 + pypdf==5.3.0
 + python-dateutil==2.9.0.post0
 + pytz==2025.1
 + pyyaml==6.0.2
 + rapidfuzz==3.12.1
 + redis==5.2.1
 + referencing==0.36.2
 + regex==2024.11.6
 + requests==2.32.3
 + rpds-py==0.23.1
 + safetensors==0.5.3
 + scikit-learn==1.6.1
 + scipy==1.15.2
 + sentencepiece==0.2.0
 + six==1.17.0
 + sniffio==1.3.1
 + sqlite-vec==0.1.6
 + starlette==0.45.3
 + tenacity==9.0.0
 + termcolor==2.5.0
 + threadpoolctl==3.5.0
 + tokenizers==0.21.0
 + tqdm==4.67.1
 + transformers==4.49.0
 + typing-extensions==4.12.2
 + tzdata==2025.1
 + urllib3==2.3.0
 + uvicorn==0.34.0
 + wrapt==1.17.2
 + xxhash==3.5.0
 + yarl==1.18.3
 + zipp==3.21.0
--> 5b5b823605a1
STEP 7/11: RUN uv pip install --no-cache llama-stack
Using Python 3.11.9 environment at: /usr
Resolved 55 packages in 1.08s
Downloading setuptools (1.2MiB)
Downloading pygments (1.2MiB)
Downloading llama-models (1.5MiB)
Downloading tiktoken (1.1MiB)
 Downloaded tiktoken
 Downloaded llama-models
 Downloaded pygments
 Downloaded setuptools
Prepared 15 packages in 402ms
Installed 15 packages in 15ms
 + jinja2==3.1.5
 + llama-models==0.1.4
 + llama-stack==0.1.4
 + llama-stack-client==0.1.4
 + markdown-it-py==3.0.0
 + markupsafe==3.0.2
 + mdurl==0.1.2
 + prompt-toolkit==3.0.50
 + pyaml==25.1.0
 + pygments==2.19.1
 + python-dotenv==1.0.1
 + rich==13.9.4
 + setuptools==75.8.2
 + tiktoken==0.9.0
 + wcwidth==0.2.13
--> 38a037443807
STEP 8/11: RUN pip uninstall -y uv
Found existing installation: uv 0.6.3
Uninstalling uv-0.6.3:
  Successfully uninstalled uv-0.6.3
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv
--> 54f749dc5ece
STEP 9/11: ENTRYPOINT ["python", "-m", "llama_stack.distribution.server.server", "--template", "ollama"]
--> 481c138b1982
STEP 10/11: RUN mkdir -p /.llama /.cache
--> 0fc174f014a8
STEP 11/11: RUN chmod -R g+rw /app /.llama /.cache
COMMIT distribution-ollama:0.1.4
--> d41b4ab4b136
Successfully tagged localhost/distribution-ollama:0.1.4
d41b4ab4b1363bfbaf6239e6f313bcb37873ef4b5f2fd816a4ee55acf2ac54d3
+ set +x
Success!
Build Successful!
```

UBI9 container successfully builds.

Run the container:

```
podman run d41b4ab4b1363bfbaf6239e6f313bcb37873ef4b5f2fd816a4ee55acf2ac54d3 --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:213: Resolved 30 providers
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  inner-inference => ollama
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  models => __routing_table__
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  inference => __autorouted__
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  inner-vector_io => sqlite-vec
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  inner-safety => llama-guard
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  shields => __routing_table__
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  safety => __autorouted__
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  vector_dbs => __routing_table__
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  vector_io => __autorouted__
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  inner-tool_runtime => brave-search
INFO 2025-02-27 13:08:03,666 llama_stack.distribution.resolver:215:  inner-tool_runtime => tavily-search
```


[//]: # (## Documentation)

---------

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 10:01:52 -08:00
Yuan Tang
18ab1985da
fix: Make remote::vllm compatible with vLLM <= v0.6.3 (#1325)
# What does this PR do?

This is to be consistent with OpenAI API and support vLLM <= v0.6.3

References:
*
https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice
* https://github.com/vllm-project/vllm/pull/10000

This fixes the error when running older versions of vLLM:

```
00:50:19.834 [START] /v1/inference/chat-completion
INFO 2025-02-28 00:50:20,203 httpx:1025: HTTP Request: POST https://api-xeai-granite-3-1-8b-instruct.apps.int.stc.ai.preprod.us-east-1.aws.paas.redhat.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 235, in endpoint
    return await maybe_await(value)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 201, in maybe_await
    return await value
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 193, in chat_completion
    return await provider.chat_completion(**params)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper
    result = await method(self, *args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 286, in chat_completion
    return await self._nonstream_chat_completion(request, self.client)
  File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 292, in _nonstream_chat_completion
    r = client.chat.completions.create(**params)
  File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 279, in wrapper
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py", line 879, in create
    return self._post(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1290, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in request
    return self._request(
  File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1071, in _request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, When using `tool_choice`, `tools` must be set.', 'input': {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'What model are you?'}]}], 'model': 'granite-3-1-8b-instruct', 'max_tokens': 4096, 'stream': False, 'temperature': 0.0, 'tools': None, 'tool_choice': 'auto'}, 'ctx': {'error': ValueError('When using `tool_choice`, `tools` must be set.')}}]", 'type': 'BadRequestError', 'param': None, 'code': 400}
INFO:     2600:1700:9d20:ac0::49:59736 - "POST /v1/inference/chat-completion HTTP/1.1" 500 Internal Server Error
00:50:20.266 [END] /v1/inference/chat-completion [StatusCode.OK] (431.99ms)
```

## Test Plan

All existing tests pass.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-28 12:48:49 -05:00
Sébastien Han
6fa257b475
chore(lint): update Ruff ignores for project conventions and maintainability (#1184)
- Added new ignores from flake8-bugbear (`B007`, `B008`)
- Ignored `C901` (high function complexity) for now, pending review
- Maintained PyTorch conventions (`N812`, `N817`)
- Allowed `E731` (lambda assignments) for flexibility
- Consolidated existing ignores (`E402`, `E501`, `F405`, `C408`, `N812`)
- Documented rationale for each ignored rule

This keeps our linting aligned with project needs while tracking
potential fixes.

Signed-off-by: Sébastien Han <seb@redhat.com>

Signed-off-by: Sébastien Han <seb@redhat.com>
2025-02-28 09:36:49 -08:00
Reid
3b57d8ee88
feat: add prompt-format list (#1222)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]


19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)

Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1,
3.2 are not all models can show it with `prompt-format`, so cannot refer
to `llama model list`,
only refer to list when enter a invalid model, so it would be nice to
help to check the valid models:
```
 llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]
llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct

before:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME]

Show llama model message formats

options:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model-name MODEL_NAME
                        Model Family (llama3_1, llama3_X, etc.)

Example:
    llama model prompt-format <options>


after:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]

Show llama model message formats

options:
  -h, --help            show this help message and exit
  -m MODEL_NAME, --model-name MODEL_NAME
                        Model Family (llama3_1, llama3_X, etc.)
  -l, --list            List the valid supported models

Example:
    llama model prompt-format <options>

$ llama model prompt-format -l
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model                        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Llama3.1-8B                  │
├──────────────────────────────┤
│ Llama3.1-70B                 │
├──────────────────────────────┤
│ Llama3.1-405B                │
├──────────────────────────────┤
│ Llama3.1-8B-Instruct         │
├──────────────────────────────┤
│ Llama3.1-70B-Instruct        │
├──────────────────────────────┤
│ Llama3.1-405B-Instruct       │
├──────────────────────────────┤
│ Llama3.2-1B                  │
├──────────────────────────────┤
│ Llama3.2-3B                  │
├──────────────────────────────┤
│ Llama3.2-1B-Instruct         │
├──────────────────────────────┤
│ Llama3.2-3B-Instruct         │
├──────────────────────────────┤
│ Llama3.2-11B-Vision          │
├──────────────────────────────┤
│ Llama3.2-90B-Vision          │
├──────────────────────────────┤
│ Llama3.2-11B-Vision-Instruct │
├──────────────────────────────┤
│ Llama3.2-90B-Vision-Instruct │
└──────────────────────────────┘
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-28 09:27:22 -08:00
Yuan Tang
234408f411
docs: Add link to distributions guide in quick start guide (#1326)
# What does this PR do?

A couple of users have asked this question so I thought it would be a
good idea to add a link.
2025-02-28 09:18:02 -08:00
Dinesh Yeduguru
7f9b767277
fix: check conda env name using basepath in exec.py (#1301)
# What does this PR do?
check conda env name using basepath in exec.py
The current logic for finding conda prefix does a `endswith` check with
just the conda env name, but this will cause us to match incorrect if
there is a different conda env which ends with same suffix. In my case,
i had stack and llama-stack as the two conda envs.

## Test Plan
llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
2025-02-27 23:07:23 -08:00
Hardik Shah
8efa53daf1
fix: Agent telemetry inputs/outputs should be structured (#1302)
Original telemetry outputs for agent turns look like this. 
Note: how output was a `str(message)` making it difficult to read them
back for downstream tasks ( eg. building eval datasets )
```
{
│   │   'input': [
│   │   │   '{"role":"system","content":"You are a helpful assistant. Use search tool to answer the questions. "}',
│   │   │   '{"role":"user","content":"Which teams played in the NBA western conference finals of 2024","context":null}'
│   │   ],
│   │   'output': "content:  tool_calls: [ToolCall(call_id='8b7294ec-a83f-4798-ad8f-6bed662f08b6', tool_name=<BuiltinTool.brave_search: 'brave_search'>, arguments={'query': 'NBA Western Conference Finals 2024 teams'})]"
│   },
``` 

Updated the outputs to be structured .

## Test 

```python
import uuid

from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig

model_id = "meta-llama/Llama-3.1-8B-Instruct"
agent_config = AgentConfig(
    model=model_id,
    instructions="You are a helpful assistant who will use the web search tools to help with answering questions.\nOnly provide final answer in short without writing full sentences. Use web search",
    toolgroups=["builtin::websearch"],
    enable_session_persistence=True,
)

agent = Agent(client, agent_config)

session_id = agent.create_session(uuid.uuid4().hex)
response = agent.create_turn(
    messages=[
        {
            "role": "user",
            "content": "latest news about llama stack",
        }
    ],
    session_id=session_id,
    stream=False,
)

pprint(response)
```
Output: 
```
Turn(
│   input_messages=[UserMessage(content='latest news about llama stack', role='user', context=None)],
│   output_message=CompletionMessage(
│   │   content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│   │   role='assistant',
│   │   stop_reason='end_of_turn',
│   │   tool_calls=[]
│   ),
│   session_id='77379546-4598-485a-b4f4-84e5da28c513',
│   started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915243, tzinfo=TzInfo(-08:00)),
│   steps=[
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[
│   │   │   │   │   ToolCall(
│   │   │   │   │   │   arguments={'query': 'latest news llama stack'},
│   │   │   │   │   │   call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│   │   │   │   │   │   tool_name='brave_search'
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   step_id='81c16bd3-eb00-4721-8edc-f386e07391a3',
│   │   │   step_type='inference',
│   │   │   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 637149, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915831, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ToolExecutionStep(
│   │   │   step_id='4782d609-a62e-45f5-8d2a-25a43db46288',
│   │   │   step_type='tool_execution',
│   │   │   tool_calls=[
│   │   │   │   ToolCall(
│   │   │   │   │   arguments={'query': 'latest news llama stack'},
│   │   │   │   │   call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│   │   │   │   │   tool_name='brave_search'
│   │   │   │   )
│   │   │   ],
│   │   │   tool_responses=[
│   │   │   │   ToolResponse(
│   │   │   │   │   call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│   │   │   │   │   content='{"query": "latest news llama stack", "top_k": [{"title": "Llama 3.2: Revol. .......  Hacker News.", "score": 0.6186197, "raw_content": null}]}',
│   │   │   │   │   tool_name='brave_search',
│   │   │   │   │   metadata=None
│   │   │   │   )
│   │   │   ],
│   │   │   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 272176, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 640743, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[]
│   │   │   ),
│   │   │   step_id='37994419-5da3-4e84-a010-8d9b85366262',
│   │   │   step_type='inference',
│   │   │   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 961275, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 273168, tzinfo=TzInfo(-08:00))
│   │   )
│   ],
│   turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│   completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 962318, tzinfo=TzInfo(-08:00)),
│   output_attachments=[]
)

```

## Check for Telemetry 
```python 

agent_logs = []
for span in client.telemetry.query_spans(
    attribute_filters=[
      {"key": "session_id", "op": "eq", "value": session_id},
    ],
    attributes_to_return=['input', 'output'],
):
    agent_logs.append(span.attributes)

pprint(json.loads(agent_logs[-1]['output']))
```
```
{
│   'content': "The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│   'tool_calls': []
}
```
2025-02-27 23:06:37 -08:00
ehhuang
caffafd101
feat: update the default system prompt for 3.2/3.3 models (#1310)
# Summary:
The current prompt doesn't work well and tend to overindex on tool
calling. This PR is not perfect, but should be an improvement over the
current prompt. We can keep iterating.

# Test Plan:

Ran on a (small) eval with 20 HotpotQA examples.

With current prompt:
https://gist.github.com/ehhuang/9f967e62751907165eb13781ea968f5c
{
│ 'basic::equality': {'accuracy': {'accuracy': 0.2, 'num_correct': 4.0,
'num_total': 20}},
│   'F1ScoringFn': {
│   │   'f1_average': 0.25333333333333335,
│   │   'precision_average': 0.23301767676767676,
│   │   'recall_average': 0.375
│   }
}


num_tool_calls=[5, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 2, 2, 1, 1, 2, 1, 2,
2]
num_examples_with_tool_call=20
num_examples_with_pythontag=0


#########################################################
With new prompt:
https://gist.github.com/ehhuang/6e4a8ecf54db68922c2be8700056f962

{
│ 'basic::equality': {'accuracy': {'accuracy': 0.25, 'num_correct': 5.0,
'num_total': 20}},
│   'F1ScoringFn': {
│   │   'f1_average': 0.35579260478321006,
│   │   'precision_average': 0.32030238933180105,
│   │   'recall_average': 0.6091666666666666
│   }
}


num_tool_calls=[2, 1, 1, 5, 5, 5, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 3,
2]
num_examples_with_tool_call=20
num_examples_with_pythontag=0


The answers have higher recall, and make fewer tool calls. Note that
these were run with max_infer_iter=5, so the current prompt hits this
limit more often, and without the limit, someitmes goes into infinite
tool calling loop.

The data here is with 3.3-70B. Results are equally poor with either
prompt with 3.2-3B ~30 recall.
2025-02-27 23:05:42 -08:00
Ashwin Bharambe
ece354eedd test: dont hardcode faiss as provider in the tests please 2025-02-27 22:54:34 -08:00
Ashwin Bharambe
4c8a0fa8dc fix: ensure ollama embedding model is registered properly in the template 2025-02-27 22:49:06 -08:00
Hardik Shah
999195fe5b
fix: [Litellm]Do not swallow first token (#1316)
`ChatCompletionResponseEventType: start` is ignored and not yielded in
the agent_instance as we expect that to not have any content.

However, litellm sends first event as `ChatCompletionResponseEventType:
start` with content ( which was the first token that we were skipping )

```
LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/agents/test_agents.py --inference-model "openai/gpt-4o-mini" -k test_agent_simple
``` 
This was failing before ( since the word hello was not in the final
response )
2025-02-27 20:53:47 -08:00
Xi Yan
7780fc92d5
fix: update getting_started notebook to pass nbeval (#1318)
# What does this PR do?

- See
3796667776
- Together's structured decoding API is flaky, add skip to cell
- Enable cell 21 to pass cell 21-23

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan


<img width="652" alt="image"
src="https://github.com/user-attachments/assets/a1e4b94b-c1ce-4869-ba0d-0860bfe33460"
/>


[//]: # (## Documentation)
2025-02-27 23:13:00 -05:00
Yuan Tang
6824d23dc9
test: Only run embedding tests for remote::nvidia (#1317)
This fixes release build failure
3796497240:

```
=================================== FAILURES ===================================
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] __________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
    with pytest.raises(BadRequestError) as excinfo:
E   Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
=========================== short test summary info ============================
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
= 9 failed, 48 passed, 2 skipped, 3 deselected, 3 xfailed, 1 xpassed, 121 warnings in 90.16s (0:01:30) =
Error: Process completed with exit code 1.
```

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 22:35:52 -05:00
Yuan Tang
a9f5c5bfca
fix: Incorrect import path for print_subcommand_description() (#1315)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 18:50:41 -08:00
Yuan Tang
f4df3a76d9
fix: Incorrect import path for print_subcommand_description() (#1314)
# What does this PR do?

Missed this one additional import in
https://github.com/meta-llama/llama-stack/pull/1313

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 18:35:49 -08:00
Yuan Tang
3567274183
fix: Incorrect import path for print_subcommand_description() (#1313)
# What does this PR do?

This fixes release build failure:
3796356500

```
+ llama model prompt-format -m Llama3.2-11B-Vision-Instruct
Traceback (most recent call last):
  File "/tmp/tmp.PXMDlmD0x5/.venv/bin/llama", line 4, in <module>
    from llama_stack.cli.llama import main
  File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 10, in <module>
    from .model import ModelParser
  File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/__init__.py", line 7, in <module>
    from .model import ModelParser  # noqa
  File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/model.py", line 16, in <module>
    from llama_stack.cli.utils import print_subcommand_description
ModuleNotFoundError: No module named 'llama_stack.cli.utils'
```

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 21:24:01 -05:00
Xi Yan
076d2f349d
fix: litellm tool call parsing event type to in_progress (#1312)
# What does this PR do?

- Test with script:
https://gist.github.com/yanxi0830/64699f3604766ac2319421b750c5bf9c

- Agent with tool calls does not get correctly parsed with LiteLLM
provider b/c we skip processing
`ChatCompletionResponseEventType.complete`.
- However, LiteLLM spits out event_type="complete" with ToolCallDelta


2f7683bc5f/llama_stack/providers/inline/agents/meta_reference/agent_instance.py (L570-L577)


- Llama Model
```
ChatCompletionResponseStreamChunk(
│   event=Event(
│   │   delta=ToolCallDelta(
│   │   │   parse_status='succeeded',
│   │   │   tool_call=ToolCall(
│   │   │   │   arguments={'kind': 'pod', 'namespace': 'openshift-lightspeed'},
│   │   │   │   call_id='call_tIjWTUdsQXhQ2XHC5ke4EQY5',
│   │   │   │   tool_name='get_object_namespace_list'
│   │   │   ),
│   │   │   type='tool_call'
│   │   ),
│   │   event_type='progress',
│   │   logprobs=None,
│   │   stop_reason='end_of_turn'
│   ),
│   metrics=None
)
ChatCompletionResponseStreamChunk(
│   event=Event(
│   │   delta=TextDelta(text='', type='text'),
│   │   event_type='complete',
│   │   logprobs=None,
│   │   stop_reason='end_of_turn'
│   ),
│   metrics=None
)
```

- LiteLLM model
```
ChatCompletionResponseStreamChunk(
│   event=Event(
│   │   delta=ToolCallDelta(
│   │   │   parse_status='succeeded',
│   │   │   tool_call=ToolCall(
│   │   │   │   arguments={'kind': 'pod', 'namespace': 'openshift-lightspeed'},
│   │   │   │   call_id='call_tIjWTUdsQXhQ2XHC5ke4EQY5',
│   │   │   │   tool_name='get_object_namespace_list'
│   │   │   ),
│   │   │   type='tool_call'
│   │   ),
│   │   event_type='complete',
│   │   logprobs=None,
│   │   stop_reason='end_of_turn'
│   ),
│   metrics=None
)
ChatCompletionResponseStreamChunk(
│   event=Event(
│   │   delta=TextDelta(text='', type='text'),
│   │   event_type='complete',
│   │   logprobs=None,
│   │   stop_reason='end_of_turn'
│   ),
│   metrics=None
)
```


[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

- Test with script:
https://gist.github.com/yanxi0830/64699f3604766ac2319421b750c5bf9c


[//]: # (## Documentation)
2025-02-27 18:00:27 -08:00
Hardik Shah
2f7683bc5f
fix: Structured outputs for recursive models (#1311)
Handle recursive nature in the structured response_formats. 

Update test to include 1 nested model.

```
 LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/inference/test_text_inference.py --inference-model "openai/gpt-4o-mini" -k test_text_chat_completion_structured_output
```

---------

Co-authored-by: Ashwin Bharambe <ashwin.bharambe@gmail.com>
2025-02-27 17:31:53 -08:00
Reid
94e2186bb8
chore: add subcommands description in help (#1219)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

```
before:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}

$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download,remove}

$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...

Operations for the Llama Stack / Distributions

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

stack_subcommands:
  {build,list-apis,list-providers,run}

===================
after:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...

Welcome to the Llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {model,stack,download,verify-download}

  model                 Work with llama models
  stack                 Operations for the Llama Stack / Distributions
  download              Download a model from llama.meta.com or Hugging Face Hub
  verify-download       Verify integrity of downloaded model files

$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...

Work with llama models

options:
  -h, --help            show this help message and exit

model_subcommands:
  {download,list,prompt-format,describe,verify-download,remove}

  download              Download a model from llama.meta.com or Hugging Face Hub
  list                  Show available llama models
  prompt-format         Show llama model message formats
  describe              Show details about a llama model
  verify-download       Verify the downloaded checkpoints' checksums for models downloaded from Meta
  remove                Remove the downloaded llama model

$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...

Operations for the Llama Stack / Distributions

options:
  -h, --help            show this help message and exit
  --version             show program's version number and exit

stack_subcommands:
  {build,list-apis,list-providers,run}

  build                 Build a Llama stack container
  list-apis             List APIs part of the Llama Stack implementation
  list-providers        Show available Llama Stack Providers for an API
  run                   Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

---------

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-27 17:00:27 -08:00
Matthew Farrellee
e28cedd833
feat: add nvidia embedding implementation for new signature, task_type, output_dimention, text_truncation (#1213)
# What does this PR do?

updates nvidia inference provider's embedding implementation to use new
signature

add support for task_type, output_dimensions, text_truncation parameters

## Test Plan

`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
2025-02-27 16:58:11 -08:00
Luis Tomas Bolivar
73c6f6126f
fix: Avoid unexpected keyword argument for sentence_transformers (#1269)
Now that remote-vllm include inline::sentence_transformers there is an
issue building the image:
Error building stack:
SentenceTransformersInferenceConfig.sample_run_config() got an
unexpected keyword argument '__distro_dir__'

To avoid that issue this fix extends the sample_run_config to accept
extra kwargs
2025-02-27 16:47:26 -08:00
Reid
c2d2a80b0a
docs: update the output of llama-stack-client models list (#1271)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]

[//]: # (## Documentation)

Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
2025-02-27 16:46:38 -08:00
Yuan Tang
264c2c46db
build: Add dotenv file for running tests with uv (#1251)
This will be useful for testing instead of having to manually pass them
every time.

---------

Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
2025-02-27 16:42:55 -08:00
Ashwin Bharambe
04de2f84e9
fix: register provider model name and HF alias in run.yaml (#1304)
Each model known to the system has two identifiers: 

- the `provider_resource_id` (what the provider calls it) -- e.g.,
`accounts/fireworks/models/llama-v3p1-8b-instruct`
- the `identifier` (`model_id`) under which it is registered and gets
routed to the appropriate provider.

We have so far used the HuggingFace repo alias as the standardized
identifier you can use to refer to the model. So in the above example,
we'd use `meta-llama/Llama-3.1-8B-Instruct` as the name under which it
gets registered. This makes it convenient for users to refer to these
models across providers.

However, we forgot to register the _actual_ provider model ID also. You
should be able to route via `provider_resource_id` also, of course.

This change fixes this (somewhat grave) omission.

*Note*: this change is additive -- more aliases work now compared to
before.

## Test Plan

Run the following for distro=(ollama fireworks together)
```
LLAMA_STACK_CONFIG=$distro \
   pytest -s -v tests/client-sdk/inference/test_text_inference.py \
   --inference-model=meta-llama/Llama-3.1-8B-Instruct --vision-inference-model=""
```
2025-02-27 16:39:23 -08:00
Ashwin Bharambe
c54164556a
fix: update notebooks to avoid using the nutsy --image-name __system__ thing (#1308)
The `--image-name __system__` thing was a hack and a bad one at that.
The actual intent was to somehow automatically detect the notebook
environment so we could avoid unnecessarily confusing things in the
llama stack build cmd-line. But I failed which led us to use the backup
`__system__` thing.

Let's just do the simple thing.

Note that `build_venv.sh` I haven't changed for now (so it still honors
the __system__ special name just that no new user should use it.)

## Test Plan

Open the notebooks from this branch in Colab (see example url below) and
ensure the builds work.


https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb

In the notebook, install llama-stack from this branch directly using:

```
!pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip
```

Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together
--image-type venv` afterwards succeeds and the library client
initialization also works.
2025-02-27 16:39:04 -08:00
ehhuang
a34f3aafcf
fix: don't include tool args not in the function definition (#1307)
# Summary:
Right now we would include toolgroup args when we encode messages with
tool_calls, which is confusing the model since they not in the function
description (see test plan for example).

# Test Plan:
Add a print statement before raw prompt is sent to providers (no good
way to test this currently)

Before:
```
cated in the same neighborhood?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood", vector_db_ids=["829a68735d744dc3830409dcc782964a"])]<|eot_id|><|start_header_id|>ipython<|end_header_id|>\n\nknowledge_search tool found 5 chunks:\nBEGIN of
```
Note the extra `vector_db_ids`

After
```
>user<|end_header_id|>\n\nAre the Laleli Mosque and Esma Sultan Mansion located in the same neighborhood?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood")]<|eot_id|><|start_header_id|>ipython<|end_header_id|>\n\nknowledge_search tool found
```
2025-02-27 16:25:30 -08:00
Xi Yan
663c6b0537
fix: duplicate ToolResponseMessage in Turn message history (#1305)
# What does this PR do?

- Reproduce with:
https://github.com/meta-llama/llama-stack-apps/blob/main/examples/agents/e2e_loop_with_client_tools.py

- **Root cause**: when we have ToolResponseMessage as part of Turn, we
will create duplicate ToolResponseMessage in the conversation history
when getting messages from a Turn.
- Fix: avoid adding duplicate ToolResponseMessage from a turn's
input_messages.
- If it is part of a Turn's steps, only add it when processing the
steps.
   - If it is not part of a Turn's steps, add it. 

[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])

## Test Plan

```
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v tests/client-sdk/agents/test_agents.py --inference-model meta-llama/Llama-3.1-8B-Instruct
```


```
python -m examples.agents.e2e_loop_with_client_tools localhost 8321 
```

```python
Turn(
│   input_messages=[
│   │   UserMessage(
│   │   │   content='What was the closing price of Google stock (ticker symbol GOOG) for 2023 ?',
│   │   │   role='user',
│   │   │   context=None
│   │   ),
│   │   ToolResponseMessage(
│   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   content='"[{\\"(\'Year\', \'\')\\":2023,\\"(\'Close\', \'GOOG\')\\":140.4254302979}]"',
│   │   │   role='tool',
│   │   │   tool_name='get_ticker_data'
│   │   )
│   ],
│   output_message=CompletionMessage(
│   │   content='Note: The actual closing price for 2023 may not be available or may be different from the result obtained above. The result is based on a hypothetical call to the get_ticker_data function.',
│   │   role='assistant',
│   │   stop_reason='end_of_turn',
│   │   tool_calls=[]
│   ),
│   session_id='4c791107-f0d8-456e-a27f-aa2fdc72b871',
│   started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 412928, tzinfo=TzInfo(-08:00)),
│   steps=[
│   │   ShieldCallStep(
│   │   │   step_id='e0514587-b7d6-4bba-8609-8e05a3a46d8a',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 858382, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 425204, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[
│   │   │   │   │   ToolCall(
│   │   │   │   │   │   arguments={
│   │   │   │   │   │   │   'ticker_symbol': 'GOOG',
│   │   │   │   │   │   │   'start': '2023-01-01',
│   │   │   │   │   │   │   'end': '2023-12-31'
│   │   │   │   │   │   },
│   │   │   │   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   │   │   │   tool_name='get_ticker_data'
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   step_id='a3ceec6a-f149-49d5-a1c2-db461e3f6e9f',
│   │   │   step_type='inference',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 910179, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 25, 871130, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='f9339865-96ca-4425-af42-a87bab343e24',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 383013, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 944012, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   ToolExecutionStep(
│   │   │   step_id='e317b74a-c4f3-4845-99a3-7d93aa6ea6c8',
│   │   │   step_type='tool_execution',
│   │   │   tool_calls=[
│   │   │   │   ToolCall(
│   │   │   │   │   arguments={'ticker_symbol': 'GOOG', 'start': '2023-01-01', 'end': '2023-12-31'},
│   │   │   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   │   │   tool_name='get_ticker_data'
│   │   │   │   )
│   │   │   ],
│   │   │   tool_responses=[
│   │   │   │   ToolResponse(
│   │   │   │   │   call_id='0d5f94fb-f070-4dc1-8eeb-63eb5918ec94',
│   │   │   │   │   content='"[{\\"(\'Year\', \'\')\\":2023,\\"(\'Close\', \'GOOG\')\\":140.4254302979}]"',
│   │   │   │   │   tool_name='get_ticker_data',
│   │   │   │   │   metadata=None
│   │   │   │   )
│   │   │   ],
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 718810, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 26, 943792, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='c4236616-db89-4c04-ad04-f51cfb726385',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 958946, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 732680, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='Note: The actual closing price for 2023 may not be available or may be different from the result obtained above. The result is based on a hypothetical call to the get_ticker_data function.',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[]
│   │   │   ),
│   │   │   step_id='3386f896-2026-41e4-a60f-f6f3c3981cf6',
│   │   │   step_type='inference',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 74750, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 28, 970724, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='bc57ac8c-f94e-4758-bf1a-0dd734eca1cf',
│   │   │   step_type='shield_call',
│   │   │   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 443016, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 86726, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   )
│   ],
│   turn_id='6ed9c25a-a4fe-4b51-ae13-de248624c2fc',
│   completed_at=datetime.datetime(2025, 2, 27, 13, 59, 37, 459456, tzinfo=TzInfo(-08:00)),
│   output_attachments=[]
)
```

```python
Turn(
│   input_messages=[
│   │   UserMessage(content='What is 40+30?', role='user', context=None),
│   │   ToolResponseMessage(
│   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   content='{"success": true, "result": 70.0}',
│   │   │   role='tool',
│   │   │   tool_name='calculator'
│   │   )
│   ],
│   output_message=CompletionMessage(
│   │   content='The result of the calculation is 70.',
│   │   role='assistant',
│   │   stop_reason='end_of_turn',
│   │   tool_calls=[]
│   ),
│   session_id='4c791107-f0d8-456e-a27f-aa2fdc72b871',
│   started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 156903, tzinfo=TzInfo(-08:00)),
│   steps=[
│   │   ShieldCallStep(
│   │   │   step_id='17b6b645-31cc-4be9-a758-a4f3b741ced9',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 780564, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 174515, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[
│   │   │   │   │   ToolCall(
│   │   │   │   │   │   arguments={'x': 40.0, 'y': 30.0, 'operation': 'add'},
│   │   │   │   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   │   │   │   tool_name='calculator'
│   │   │   │   │   )
│   │   │   │   ]
│   │   │   ),
│   │   │   step_id='f59e951a-2b75-497d-a075-ec9aad9aad12',
│   │   │   step_type='inference',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 141869, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 0, 792047, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='efafa0cf-23b9-4a90-8350-3a186d80925d',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 766293, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 177473, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   ToolExecutionStep(
│   │   │   step_id='877cfbe7-57a8-4056-9c29-49aa38dd337c',
│   │   │   step_type='tool_execution',
│   │   │   tool_calls=[
│   │   │   │   ToolCall(
│   │   │   │   │   arguments={'x': 40.0, 'y': 30.0, 'operation': 'add'},
│   │   │   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   │   │   tool_name='calculator'
│   │   │   │   )
│   │   │   ],
│   │   │   tool_responses=[
│   │   │   │   ToolResponse(
│   │   │   │   │   call_id='8e54aca9-244d-44ca-ada0-0365090e8622',
│   │   │   │   │   content='{"success": true, "result": 70.0}',
│   │   │   │   │   tool_name='calculator',
│   │   │   │   │   metadata=None
│   │   │   │   )
│   │   │   ],
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 930899, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 177202, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='d47c6160-45d9-47c1-8e39-2faae65ee468',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 3, 510402, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 2, 949433, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   ),
│   │   InferenceStep(
│   │   │   api_model_response=CompletionMessage(
│   │   │   │   content='The result of the calculation is 70.',
│   │   │   │   role='assistant',
│   │   │   │   stop_reason='end_of_turn',
│   │   │   │   tool_calls=[]
│   │   │   ),
│   │   │   step_id='660ba1cc-770e-471c-bf6e-11e103d74443',
│   │   │   step_type='inference',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 4, 814944, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 3, 521309, tzinfo=TzInfo(-08:00))
│   │   ),
│   │   ShieldCallStep(
│   │   │   step_id='4dab8bb0-7d38-4465-ae1a-10069de2b3d1',
│   │   │   step_type='shield_call',
│   │   │   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   │   │   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 5, 428561, tzinfo=TzInfo(-08:00)),
│   │   │   started_at=datetime.datetime(2025, 2, 27, 14, 0, 4, 825970, tzinfo=TzInfo(-08:00)),
│   │   │   violation=None
│   │   )
│   ],
│   turn_id='4daff286-f703-417e-a5dc-0e158582bbec',
│   completed_at=datetime.datetime(2025, 2, 27, 14, 0, 5, 462823, tzinfo=TzInfo(-08:00)),
│   output_attachments=[]
)
```


[//]: # (## Documentation)
2025-02-27 15:06:47 -08:00