# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
`start_venv.sh` lifecycle should be:
025f615868
>>
34e3faa4e8
>>
4684fd3f8d
Finally replaced by `start_stack.sh`
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
We want to bundle a bunch of (typically remote) providers in a distro
template and be able to configure them "on the fly" via environment
variables. So far, we have been able to do this with simple env var
replacements. However, sometimes you want to only conditionally enable
providers (because the relevant remote services may not be alive, or
relevant.) This was not possible until now.
To aid this, we add a simple (bash-like) env var replacement
enhancement: `${env.FOO+bar}` evaluates to `bar` if the variable is SET
and evaluates to empty string if it is not. On top of that, we update
our main resolver to ignore any provider whose ID is null.
This allows using the distro like this:
```bash
llama stack run dev --env CHROMADB_URL=http://localhost:6001 --env ENABLE_CHROMADB=1
```
when only Chroma is UP. This disables the other `pgvector` provider in
the run configuration.
## Test Plan
Hard code `chromadb` as the vector io provider inside
`test_vector_io.py` and run:
```bash
LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -s -v tests/client-sdk/vector_io/ --embedding-model all-MiniLM-L6-v2
```
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# Summary:
This led to extremely hard to debug messages.
Before:
llama_stack/distribution/library_client.py:275: in request
response = await self._call_non_streaming(
llama_stack/distribution/library_client.py:322: in _call_non_streaming
result = await matched_func(**body)
llama_stack/providers/utils/telemetry/trace_protocol.py:102: in
async_wrapper
result = await method(self, *args, **kwargs)
llama_stack/providers/inline/agents/meta_reference/agents.py:80: in
create_agent
value=agent_config.model_dump_json(),
E AttributeError: 'dict' object has no attribute 'model_dump_json'
After:
E ValueError: Failed to convert parameter {'model':
'meta-llama/Llama-3.1-8B-Instruct', 'instructions': 'You are a helpful
assistant', 'sampling_params': {'strategy': {'type': 'top_p',
'temperature': 0.0001, 'top_p': 0.9}}, 'toolgroups': [{'name':
'builtin::rag'}], 'input_shields': ['meta-llama/Llama-Guard-3-8B'],
'output_shields': ['meta-llama/Llama-Guard-3-8B'],
'enable_session_persistence': False} into <class
'llama_stack.apis.agents.agents.AgentConfig'>: 2 validation errors for
AgentConfig
E toolgroups.0.str
E Input should be a valid string [type=string_type, input_value={'name':
'builtin::rag'}, input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/string_type
E toolgroups.0.AgentToolGroupWithArgs.args
E Field required [type=missing, input_value={'name': 'builtin::rag'},
input_type=dict]
E For further information visit
https://errors.pydantic.dev/2.10/v/missing
# Test Plan:
LLAMA_STACK_CONFIG=fireworks pytest -s -v tests/client-sdk/
--safety-shield meta-llama/Llama-Guard-3-8B
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
21ec67356c/distributions
It should missed the `s`.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Summary:
This test is no longer relevant. We updated the default system prompt in
https://github.com/meta-llama/llama-stack/pull/1310, and system override
behavior is already unit-tested in test_prompt_adapter.py
Test Plan:
read
# What does this PR do?
This PR updates the version in the
[README.md](https://github.com/meta-llama/llama-stack/blob/main/docs/zero_to_hero_guide/README.md)
to reflect the latest changes in Llama Stack setup.
Previously, using **llama-stack==0.1.0** caused an error when running:
```bash
llama stack build --template ollama --image-type conda
```
Upgrading to llama-stack==0.1.3 resolves this issue.
## Test Plan
- Verified that `llama stack build --template ollama --image-type conda`
works correctly.
---------
Signed-off-by: Surya Prakash Pathak <supathak@redhat.com>
# What does this PR do?
Use @client_tool decorator instead of ClientTool
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
LLAMA_STACK_CONFIG=fireworks pytest -v tests/client-sdk/agents/test_agents.py --inference-model "meta-llama/Llama-3.3-70B-Instruct"
```
<img width="1053" alt="image"
src="https://github.com/user-attachments/assets/d3ade884-ef42-494e-8028-3b09d9ef1978"
/>
[//]: # (## Documentation)
# What does this PR do?
- using `eval` is a security risk
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
- see https://github.com/meta-llama/llama-stack/pull/1327
cc @SLR722 we will need to update the corresponding dataset via
```python
def update_to_json_str():
dataset = datasets.load_dataset(...)
processed_dataset = dataset[split].map(
lambda x: {
"column": json.dumps(eval(x["column"]))
}
)
processed_dataset.push_to_hub(...)
```
[//]: # (## Documentation)
# What does this PR do?
An API spec must talk about Error handling. This was a pretty glaring
omission so far. This PR begins to address it by adding a set of
standard error responses we can attach to all our API calls.
At a future point, we can add specific error types where necessary
(although we should not hurry to do that; it is best done very late.)
## Test Plan
Checked that Stainless SDK generation succeeds.
# What does this PR do?
- Using `eval` on server is a security risk
- Replace `eval` with `json.loads`
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
```
pytest -v -s --nbval-lax ./llama-stack/docs/notebooks/Llama_Stack_Benchmark_Evals.ipynb
```
<img width="747" alt="image"
src="https://github.com/user-attachments/assets/7aff3d95-0b12-4394-b9d0-aeff791eee38"
/>
[//]: # (## Documentation)
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
Since released the `--downloaded` option, so update the related
documents.
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
55eb257459/llama_stack/cli/stack/run.py (L22)55eb257459/llama_stack/cli/stack/_build.py (L103)
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
- skip media tests for models that do not support media
- skip output_dimension tests for models that do not support it
- skip task_type tests for models that do not support it
- provide task_type for models that require it
## Test Plan
`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model ...`
# What does this PR do?
This is to be consistent with OpenAI API and support vLLM <= v0.6.3
References:
*
https://platform.openai.com/docs/api-reference/chat/create#chat-create-tool_choice
* https://github.com/vllm-project/vllm/pull/10000
This fixes the error when running older versions of vLLM:
```
00:50:19.834 [START] /v1/inference/chat-completion
INFO 2025-02-28 00:50:20,203 httpx:1025: HTTP Request: POST https://api-xeai-granite-3-1-8b-instruct.apps.int.stc.ai.preprod.us-east-1.aws.paas.redhat.com/v1/chat/completions "HTTP/1.1 400 Bad Request"
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 235, in endpoint
return await maybe_await(value)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/server/server.py", line 201, in maybe_await
return await value
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper
result = await method(self, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/distribution/routers/routers.py", line 193, in chat_completion
return await provider.chat_completion(**params)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/utils/telemetry/trace_protocol.py", line 89, in async_wrapper
result = await method(self, *args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 286, in chat_completion
return await self._nonstream_chat_completion(request, self.client)
File "/usr/local/lib/python3.10/site-packages/llama_stack/providers/remote/inference/vllm/vllm.py", line 292, in _nonstream_chat_completion
r = client.chat.completions.create(**params)
File "/usr/local/lib/python3.10/site-packages/openai/_utils/_utils.py", line 279, in wrapper
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/site-packages/openai/resources/chat/completions/completions.py", line 879, in create
return self._post(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1290, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 967, in request
return self._request(
File "/usr/local/lib/python3.10/site-packages/openai/_base_client.py", line 1071, in _request
raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "[{'type': 'value_error', 'loc': ('body',), 'msg': 'Value error, When using `tool_choice`, `tools` must be set.', 'input': {'messages': [{'role': 'user', 'content': [{'type': 'text', 'text': 'What model are you?'}]}], 'model': 'granite-3-1-8b-instruct', 'max_tokens': 4096, 'stream': False, 'temperature': 0.0, 'tools': None, 'tool_choice': 'auto'}, 'ctx': {'error': ValueError('When using `tool_choice`, `tools` must be set.')}}]", 'type': 'BadRequestError', 'param': None, 'code': 400}
INFO: 2600:1700:9d20:ac0::49:59736 - "POST /v1/inference/chat-completion HTTP/1.1" 500 Internal Server Error
00:50:20.266 [END] /v1/inference/chat-completion [StatusCode.OK] (431.99ms)
```
## Test Plan
All existing tests pass.
---------
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
19ae4b35d9/llama_stack/cli/model/prompt_format.py (L47)
Based on the comment: `Only Llama 3.1 and 3.2 are supported`, even 3.1,
3.2 are not all models can show it with `prompt-format`, so cannot refer
to `llama model list`,
only refer to list when enter a invalid model, so it would be nice to
help to check the valid models:
```
llama model prompt-format -m Llama3.1-405B-Instruct:bf16-mp8
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]
llama model prompt-format: error: Llama3.1-405B-Instruct:bf16-mp8 is not a valid Model <<<<---. Choose one from --
Llama3.1-8B
Llama3.1-70B
Llama3.1-405B
Llama3.1-8B-Instruct
Llama3.1-70B-Instruct
Llama3.1-405B-Instruct
Llama3.2-1B
Llama3.2-3B
Llama3.2-1B-Instruct
Llama3.2-3B-Instruct
Llama3.2-11B-Vision
Llama3.2-90B-Vision
Llama3.2-11B-Vision-Instruct
Llama3.2-90B-Vision-Instruct
before:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
Example:
llama model prompt-format <options>
after:
$ llama model prompt-format --help
usage: llama model prompt-format [-h] [-m MODEL_NAME] [-l]
Show llama model message formats
options:
-h, --help show this help message and exit
-m MODEL_NAME, --model-name MODEL_NAME
Model Family (llama3_1, llama3_X, etc.)
-l, --list List the valid supported models
Example:
llama model prompt-format <options>
$ llama model prompt-format -l
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Model ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Llama3.1-8B │
├──────────────────────────────┤
│ Llama3.1-70B │
├──────────────────────────────┤
│ Llama3.1-405B │
├──────────────────────────────┤
│ Llama3.1-8B-Instruct │
├──────────────────────────────┤
│ Llama3.1-70B-Instruct │
├──────────────────────────────┤
│ Llama3.1-405B-Instruct │
├──────────────────────────────┤
│ Llama3.2-1B │
├──────────────────────────────┤
│ Llama3.2-3B │
├──────────────────────────────┤
│ Llama3.2-1B-Instruct │
├──────────────────────────────┤
│ Llama3.2-3B-Instruct │
├──────────────────────────────┤
│ Llama3.2-11B-Vision │
├──────────────────────────────┤
│ Llama3.2-90B-Vision │
├──────────────────────────────┤
│ Llama3.2-11B-Vision-Instruct │
├──────────────────────────────┤
│ Llama3.2-90B-Vision-Instruct │
└──────────────────────────────┘
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
check conda env name using basepath in exec.py
The current logic for finding conda prefix does a `endswith` check with
just the conda env name, but this will cause us to match incorrect if
there is a different conda env which ends with same suffix. In my case,
i had stack and llama-stack as the two conda envs.
## Test Plan
llama stack run ~/.llama/distributions/fireworks/fireworks-run.yaml
Original telemetry outputs for agent turns look like this.
Note: how output was a `str(message)` making it difficult to read them
back for downstream tasks ( eg. building eval datasets )
```
{
│ │ 'input': [
│ │ │ '{"role":"system","content":"You are a helpful assistant. Use search tool to answer the questions. "}',
│ │ │ '{"role":"user","content":"Which teams played in the NBA western conference finals of 2024","context":null}'
│ │ ],
│ │ 'output': "content: tool_calls: [ToolCall(call_id='8b7294ec-a83f-4798-ad8f-6bed662f08b6', tool_name=<BuiltinTool.brave_search: 'brave_search'>, arguments={'query': 'NBA Western Conference Finals 2024 teams'})]"
│ },
```
Updated the outputs to be structured .
## Test
```python
import uuid
from llama_stack_client.lib.agents.agent import Agent
from llama_stack_client.lib.agents.event_logger import EventLogger
from llama_stack_client.types.agent_create_params import AgentConfig
model_id = "meta-llama/Llama-3.1-8B-Instruct"
agent_config = AgentConfig(
model=model_id,
instructions="You are a helpful assistant who will use the web search tools to help with answering questions.\nOnly provide final answer in short without writing full sentences. Use web search",
toolgroups=["builtin::websearch"],
enable_session_persistence=True,
)
agent = Agent(client, agent_config)
session_id = agent.create_session(uuid.uuid4().hex)
response = agent.create_turn(
messages=[
{
"role": "user",
"content": "latest news about llama stack",
}
],
session_id=session_id,
stream=False,
)
pprint(response)
```
Output:
```
Turn(
│ input_messages=[UserMessage(content='latest news about llama stack', role='user', context=None)],
│ output_message=CompletionMessage(
│ │ content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│ │ role='assistant',
│ │ stop_reason='end_of_turn',
│ │ tool_calls=[]
│ ),
│ session_id='77379546-4598-485a-b4f4-84e5da28c513',
│ started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915243, tzinfo=TzInfo(-08:00)),
│ steps=[
│ │ InferenceStep(
│ │ │ api_model_response=CompletionMessage(
│ │ │ │ content='',
│ │ │ │ role='assistant',
│ │ │ │ stop_reason='end_of_turn',
│ │ │ │ tool_calls=[
│ │ │ │ │ ToolCall(
│ │ │ │ │ │ arguments={'query': 'latest news llama stack'},
│ │ │ │ │ │ call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│ │ │ │ │ │ tool_name='brave_search'
│ │ │ │ │ )
│ │ │ │ ]
│ │ │ ),
│ │ │ step_id='81c16bd3-eb00-4721-8edc-f386e07391a3',
│ │ │ step_type='inference',
│ │ │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│ │ │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 637149, tzinfo=TzInfo(-08:00)),
│ │ │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 43, 915831, tzinfo=TzInfo(-08:00))
│ │ ),
│ │ ToolExecutionStep(
│ │ │ step_id='4782d609-a62e-45f5-8d2a-25a43db46288',
│ │ │ step_type='tool_execution',
│ │ │ tool_calls=[
│ │ │ │ ToolCall(
│ │ │ │ │ arguments={'query': 'latest news llama stack'},
│ │ │ │ │ call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│ │ │ │ │ tool_name='brave_search'
│ │ │ │ )
│ │ │ ],
│ │ │ tool_responses=[
│ │ │ │ ToolResponse(
│ │ │ │ │ call_id='84c0fa10-e24a-4f91-a9ff-415a9ec0bb0b',
│ │ │ │ │ content='{"query": "latest news llama stack", "top_k": [{"title": "Llama 3.2: Revol. ....... Hacker News.", "score": 0.6186197, "raw_content": null}]}',
│ │ │ │ │ tool_name='brave_search',
│ │ │ │ │ metadata=None
│ │ │ │ )
│ │ │ ],
│ │ │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│ │ │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 272176, tzinfo=TzInfo(-08:00)),
│ │ │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 44, 640743, tzinfo=TzInfo(-08:00))
│ │ ),
│ │ InferenceStep(
│ │ │ api_model_response=CompletionMessage(
│ │ │ │ content="The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│ │ │ │ role='assistant',
│ │ │ │ stop_reason='end_of_turn',
│ │ │ │ tool_calls=[]
│ │ │ ),
│ │ │ step_id='37994419-5da3-4e84-a010-8d9b85366262',
│ │ │ step_type='inference',
│ │ │ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│ │ │ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 961275, tzinfo=TzInfo(-08:00)),
│ │ │ started_at=datetime.datetime(2025, 2, 27, 11, 2, 46, 273168, tzinfo=TzInfo(-08:00))
│ │ )
│ ],
│ turn_id='2c6b5273-4b16-404f-bed2-c0025fd63b45',
│ completed_at=datetime.datetime(2025, 2, 27, 11, 2, 48, 962318, tzinfo=TzInfo(-08:00)),
│ output_attachments=[]
)
```
## Check for Telemetry
```python
agent_logs = []
for span in client.telemetry.query_spans(
attribute_filters=[
{"key": "session_id", "op": "eq", "value": session_id},
],
attributes_to_return=['input', 'output'],
):
agent_logs.append(span.attributes)
pprint(json.loads(agent_logs[-1]['output']))
```
```
{
│ 'content': "The latest news about Llama Stack is that Meta has released Llama 3.2, which includes small and medium-sized vision LLMs (11B and 90B) and lightweight, text-only models (1B and 3B) that fit onto select edge and mobile devices. Additionally, Llama Stack distributions have been released to simplify the way developers work with Llama models in different environments. However, a critical vulnerability has been discovered in Meta's Llama-Stack, which puts AI applications at risk.",
│ 'tool_calls': []
}
```
# Summary:
The current prompt doesn't work well and tend to overindex on tool
calling. This PR is not perfect, but should be an improvement over the
current prompt. We can keep iterating.
# Test Plan:
Ran on a (small) eval with 20 HotpotQA examples.
With current prompt:
https://gist.github.com/ehhuang/9f967e62751907165eb13781ea968f5c
{
│ 'basic::equality': {'accuracy': {'accuracy': 0.2, 'num_correct': 4.0,
'num_total': 20}},
│ 'F1ScoringFn': {
│ │ 'f1_average': 0.25333333333333335,
│ │ 'precision_average': 0.23301767676767676,
│ │ 'recall_average': 0.375
│ }
}
num_tool_calls=[5, 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 2, 2, 1, 1, 2, 1, 2,
2]
num_examples_with_tool_call=20
num_examples_with_pythontag=0
#########################################################
With new prompt:
https://gist.github.com/ehhuang/6e4a8ecf54db68922c2be8700056f962
{
│ 'basic::equality': {'accuracy': {'accuracy': 0.25, 'num_correct': 5.0,
'num_total': 20}},
│ 'F1ScoringFn': {
│ │ 'f1_average': 0.35579260478321006,
│ │ 'precision_average': 0.32030238933180105,
│ │ 'recall_average': 0.6091666666666666
│ }
}
num_tool_calls=[2, 1, 1, 5, 5, 5, 1, 2, 1, 1, 2, 1, 1, 1, 1, 1, 2, 1, 3,
2]
num_examples_with_tool_call=20
num_examples_with_pythontag=0
The answers have higher recall, and make fewer tool calls. Note that
these were run with max_infer_iter=5, so the current prompt hits this
limit more often, and without the limit, someitmes goes into infinite
tool calling loop.
The data here is with 3.3-70B. Results are equally poor with either
prompt with 3.2-3B ~30 recall.
`ChatCompletionResponseEventType: start` is ignored and not yielded in
the agent_instance as we expect that to not have any content.
However, litellm sends first event as `ChatCompletionResponseEventType:
start` with content ( which was the first token that we were skipping )
```
LLAMA_STACK_CONFIG=dev pytest -s -v tests/client-sdk/agents/test_agents.py --inference-model "openai/gpt-4o-mini" -k test_agent_simple
```
This was failing before ( since the word hello was not in the final
response )
# What does this PR do?
- See
3796667776
- Together's structured decoding API is flaky, add skip to cell
- Enable cell 21 to pass cell 21-23
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
<img width="652" alt="image"
src="https://github.com/user-attachments/assets/a1e4b94b-c1ce-4869-ba0d-0860bfe33460"
/>
[//]: # (## Documentation)
This fixes release build failure
3796497240:
```
=================================== FAILURES ===================================
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_______ test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] _______
llama-stack/tests/client-sdk/inference/test_embedding.py:166: in test_embedding_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] __________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
_________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
________ test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] _________
llama-stack/tests/client-sdk/inference/test_embedding.py:223: in test_embedding_text_truncation_error
with pytest.raises(BadRequestError) as excinfo:
E Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
=========================== short test summary info ============================
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-text-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-None] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_truncation_error[txt=8B:emb=MiniLM-long-str-none] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-NONE] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-END] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-START] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-left] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
FAILED llama-stack/tests/client-sdk/inference/test_embedding.py::test_embedding_text_truncation_error[txt=8B:emb=MiniLM-right] - Failed: DID NOT RAISE <class 'llama_stack_client.BadRequestError'>
= 9 failed, 48 passed, 2 skipped, 3 deselected, 3 xfailed, 1 xpassed, 121 warnings in 90.16s (0:01:30) =
Error: Process completed with exit code 1.
```
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
Missed this one additional import in
https://github.com/meta-llama/llama-stack/pull/1313
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
This fixes release build failure:
3796356500
```
+ llama model prompt-format -m Llama3.2-11B-Vision-Instruct
Traceback (most recent call last):
File "/tmp/tmp.PXMDlmD0x5/.venv/bin/llama", line 4, in <module>
from llama_stack.cli.llama import main
File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/llama.py", line 10, in <module>
from .model import ModelParser
File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/__init__.py", line 7, in <module>
from .model import ModelParser # noqa
File "/tmp/tmp.PXMDlmD0x5/.venv/lib/python3.10/site-packages/llama_stack/cli/model/model.py", line 16, in <module>
from llama_stack.cli.utils import print_subcommand_description
ModuleNotFoundError: No module named 'llama_stack.cli.utils'
```
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: Yuan Tang <terrytangyuan@gmail.com>
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
```
before:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download,remove}
$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...
Operations for the Llama Stack / Distributions
options:
-h, --help show this help message and exit
--version show program's version number and exit
stack_subcommands:
{build,list-apis,list-providers,run}
===================
after:
$ llama
usage: llama [-h] {model,stack,download,verify-download} ...
Welcome to the Llama CLI
options:
-h, --help show this help message and exit
subcommands:
{model,stack,download,verify-download}
model Work with llama models
stack Operations for the Llama Stack / Distributions
download Download a model from llama.meta.com or Hugging Face Hub
verify-download Verify integrity of downloaded model files
$ llama model --help
usage: llama model [-h] {download,list,prompt-format,describe,verify-download,remove} ...
Work with llama models
options:
-h, --help show this help message and exit
model_subcommands:
{download,list,prompt-format,describe,verify-download,remove}
download Download a model from llama.meta.com or Hugging Face Hub
list Show available llama models
prompt-format Show llama model message formats
describe Show details about a llama model
verify-download Verify the downloaded checkpoints' checksums for models downloaded from Meta
remove Remove the downloaded llama model
$ llama stack --help
usage: llama stack [-h] [--version] {build,list-apis,list-providers,run} ...
Operations for the Llama Stack / Distributions
options:
-h, --help show this help message and exit
--version show program's version number and exit
stack_subcommands:
{build,list-apis,list-providers,run}
build Build a Llama stack container
list-apis List APIs part of the Llama Stack implementation
list-providers Show available Llama Stack Providers for an API
run Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.
```
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
---------
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
# What does this PR do?
updates nvidia inference provider's embedding implementation to use new
signature
add support for task_type, output_dimensions, text_truncation parameters
## Test Plan
`LLAMA_STACK_BASE_URL=http://localhost:8321 pytest -v
tests/client-sdk/inference/test_embedding.py --embedding-model
baai/bge-m3`
Now that remote-vllm include inline::sentence_transformers there is an
issue building the image:
Error building stack:
SentenceTransformersInferenceConfig.sample_run_config() got an
unexpected keyword argument '__distro_dir__'
To avoid that issue this fix extends the sample_run_config to accept
extra kwargs
# What does this PR do?
[Provide a short summary of what this PR does and why. Link to relevant
issues if applicable.]
[//]: # (If resolving an issue, uncomment and update the line below)
[//]: # (Closes #[issue-number])
## Test Plan
[Describe the tests you ran to verify your changes with result
summaries. *Provide clear instructions so the plan can be easily
re-executed.*]
[//]: # (## Documentation)
Signed-off-by: reidliu <reid201711@gmail.com>
Co-authored-by: reidliu <reid201711@gmail.com>
Each model known to the system has two identifiers:
- the `provider_resource_id` (what the provider calls it) -- e.g.,
`accounts/fireworks/models/llama-v3p1-8b-instruct`
- the `identifier` (`model_id`) under which it is registered and gets
routed to the appropriate provider.
We have so far used the HuggingFace repo alias as the standardized
identifier you can use to refer to the model. So in the above example,
we'd use `meta-llama/Llama-3.1-8B-Instruct` as the name under which it
gets registered. This makes it convenient for users to refer to these
models across providers.
However, we forgot to register the _actual_ provider model ID also. You
should be able to route via `provider_resource_id` also, of course.
This change fixes this (somewhat grave) omission.
*Note*: this change is additive -- more aliases work now compared to
before.
## Test Plan
Run the following for distro=(ollama fireworks together)
```
LLAMA_STACK_CONFIG=$distro \
pytest -s -v tests/client-sdk/inference/test_text_inference.py \
--inference-model=meta-llama/Llama-3.1-8B-Instruct --vision-inference-model=""
```
The `--image-name __system__` thing was a hack and a bad one at that.
The actual intent was to somehow automatically detect the notebook
environment so we could avoid unnecessarily confusing things in the
llama stack build cmd-line. But I failed which led us to use the backup
`__system__` thing.
Let's just do the simple thing.
Note that `build_venv.sh` I haven't changed for now (so it still honors
the __system__ special name just that no new user should use it.)
## Test Plan
Open the notebooks from this branch in Colab (see example url below) and
ensure the builds work.
https://colab.research.google.com/github/meta-llama/llama-stack/blob/foo/docs/getting_started.ipynb
In the notebook, install llama-stack from this branch directly using:
```
!pip install -U https://github.com/meta-llama/llama-stack/archive/refs/heads/foo.zip
```
Verify that `!UV_SYSTEM_PYTHON=1 llama stack build --template together
--image-type venv` afterwards succeeds and the library client
initialization also works.
# Summary:
Right now we would include toolgroup args when we encode messages with
tool_calls, which is confusing the model since they not in the function
description (see test plan for example).
# Test Plan:
Add a print statement before raw prompt is sent to providers (no good
way to test this currently)
Before:
```
cated in the same neighborhood?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood", vector_db_ids=["829a68735d744dc3830409dcc782964a"])]<|eot_id|><|start_header_id|>ipython<|end_header_id|>\n\nknowledge_search tool found 5 chunks:\nBEGIN of
```
Note the extra `vector_db_ids`
After
```
>user<|end_header_id|>\n\nAre the Laleli Mosque and Esma Sultan Mansion located in the same neighborhood?<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n[knowledge_search(query="Laleli Mosque and Esma Sultan Mansion same neighborhood")]<|eot_id|><|start_header_id|>ipython<|end_header_id|>\n\nknowledge_search tool found
```