chore: move all Llama Stack types from llama-models to llama-stack (#1098)

llama-models should have extremely minimal cruft. Its sole purpose
should be didactic -- show the simplest implementation of the llama
models and document the prompt formats, etc.

This PR is the complement to
https://github.com/meta-llama/llama-models/pull/279

## Test Plan

Ensure all `llama` CLI `model` sub-commands work:

```bash
llama model list
llama model download --model-id ...
llama model prompt-format -m ...
```

Ran tests:
```bash
cd tests/client-sdk
LLAMA_STACK_CONFIG=fireworks pytest -s -v inference/
LLAMA_STACK_CONFIG=fireworks pytest -s -v vector_io/
LLAMA_STACK_CONFIG=fireworks pytest -s -v agents/
```

Create a fresh venv `uv venv && source .venv/bin/activate` and run
`llama stack build --template fireworks --image-type venv` followed by
`llama stack run together --image-type venv` <-- the server runs

Also checked that the OpenAPI generator can run and there is no change
in the generated files as a result.

```bash
cd docs/openapi_generator
sh run_openapi_generator.sh
```
This commit is contained in:
Ashwin Bharambe 2025-02-14 09:10:59 -08:00 committed by GitHub
parent c0ee512980
commit 314ee09ae3
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
138 changed files with 8491 additions and 465 deletions

View file

@ -0,0 +1,12 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# top-level folder for each specific model found within the models/ directory at
# the top-level of this source tree.

View file

@ -0,0 +1,235 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# top-level folder for each specific model found within the models/ directory at
# the top-level of this source tree.
import json
import textwrap
from llama_models.datatypes import (
RawMessage,
StopReason,
ToolCall,
ToolPromptFormat,
)
from ..prompt_format import (
TextCompletionContent,
UseCase,
llama3_1_builtin_code_interpreter_dialog,
)
def user_tool_call():
content = textwrap.dedent(
"""
Questions: Can you retrieve the details for the user with the ID 7890, who has black as their special request?
Here is a list of functions in JSON format that you can invoke:
[
{
"name": "get_user_info",
"description": "Retrieve details for a specific user by their unique identifier. Note that the provided function is in Python 3 syntax.",
"parameters": {
"type": "dict",
"required": [
"user_id"
],
"properties": {
"user_id": {
"type": "integer",
"description": "The unique identifier of the user. It is used to fetch the specific user details from the database."
},
"special": {
"type": "string",
"description": "Any special information or parameters that need to be considered while fetching user details.",
"default": "none"
}
}
}
}
]
Should you decide to return the function call(s),Put it in the format of [func1(params_name=params_value, params_name2=params_value2...), func2(params)]
NO other text MUST be included.
"""
)
return content.strip()
def system_tool_call():
content = textwrap.dedent(
"""
You are an expert in composing functions. You are given a question and a set of possible functions.
Based on the question, you will need to make one or more function/tool calls to achieve the purpose.
If none of the function can be used, point it out. If the given question lacks the parameters required by the function,
also point it out. You should only return the function call in tools call sections.
If you decide to invoke any of the function(s), you MUST put it in the format of [func_name1(params_name1=params_value1, params_name2=params_value2...), func_name2(params)]
You SHOULD NOT include any other text in the response.
Here is a list of functions in JSON format that you can invoke.
[
{
"name": "get_weather",
"description": "Get weather info for places",
"parameters": {
"type": "dict",
"required": [
"city"
],
"properties": {
"city": {
"type": "string",
"description": "The name of the city to get the weather for"
},
"metric": {
"type": "string",
"description": "The metric for weather. Options are: celsius, fahrenheit",
"default": "celsius"
}
}
}
}
]
"""
)
return content.strip()
def usecases():
return [
UseCase(
title="User and assistant conversation",
description="Here is a regular multi-turn user assistant conversation and how its formatted.",
dialogs=[
[
RawMessage(role="system", content="You are a helpful assistant"),
RawMessage(role="user", content="Who are you?"),
]
],
notes="This format is unchanged from Llama3.1",
),
UseCase(
title="Zero shot function calling",
description=textwrap.dedent(
"""
For Llama3.2 1B and 3B instruct models, we are introducing a new format for zero shot function calling.
This new format is designed to be more flexible and powerful than the previous format.
All available functions can be provided in the system message. A key difference is in the format of how the assistant responds with function calls.
It is pythonic in the form of `[func1(params_name=params_value, params_name2=params_value2...), func2(params)]` instead of the `json` or `<function>` tag that were defined in Llama3.1.
Here is an example for the same,
"""
),
dialogs=[
# Zero shot tool calls as system message
[
RawMessage(role="system", content=system_tool_call()),
RawMessage(role="user", content="What is the weather in SF and Seattle?"),
],
],
notes=textwrap.dedent(
"""
- The output supports multiple tool calls natively
- JSON format for defining the functions in the system prompt is similar to Llama3.1
"""
),
),
UseCase(
title="Zero shot function calling with user message",
description=textwrap.dedent(
"""
While the default is to provide all function calls in a system message, in Llama3.2 text models you can also provide information for all the available tools in a user message.
"""
),
dialogs=[
# Zero shot tool call as user message
[
RawMessage(role="user", content=user_tool_call()),
],
],
notes=textwrap.dedent(
"""
- The tool call format for the model is the same whether your function calls are provided in the system or user message.
- While builtin tool calls end with a <|eom_id|>, notice the <|eot_id|> for zero shot tool calls.
"""
),
),
UseCase(
title="Code Interpreter",
description=textwrap.dedent(
"""
Code Interpreter continues to work in 3.2 text models similar to Llama 3.1 model family.
Here is an example,
"""
),
dialogs=[llama3_1_builtin_code_interpreter_dialog()],
notes=textwrap.dedent(
"""
- Note `Environment: ipython` in the system prompt.
- Note that the response starts with `<|python_tag|>` and ends with `<|eom_id|>`
"""
),
),
UseCase(
title="Zero shot function calling E2E format",
description=textwrap.dedent(
"""
Here is an example of the e2e cycle of tool calls with the model in a muti-step way.
"""
),
dialogs=[
[
RawMessage(role="system", content=system_tool_call()),
RawMessage(role="user", content="What is the weather in SF?"),
RawMessage(
role="assistant",
content="",
stop_reason=StopReason.end_of_turn,
tool_calls=[
ToolCall(
call_id="cc",
tool_name="get_weather",
arguments={
"city": "San Francisco",
"metric": "celsius",
},
)
],
),
RawMessage(
role="tool",
content=json.dumps("25 C"),
),
],
],
notes=textwrap.dedent(
"""
- The output of the function call is provided back to the model as a tool response ( in json format ).
- Notice `<|start_header_id|>ipython<|end_header_id|>` as the header message preceding the tool response.
- The model finally summarizes the information from the tool response and returns the result to the user.
"""
),
tool_prompt_format=ToolPromptFormat.python_list,
),
UseCase(
title="Prompt format for base models",
description=textwrap.dedent(
"""
For base models (Llama3.2-1B and Llama3.2-3B), the prompt format for a simple completion is as follows
"""
),
dialogs=[
TextCompletionContent(content="The color of the sky is blue but sometimes it can also be"),
],
notes="Same as Llama3.1",
),
]

View file

@ -0,0 +1,133 @@
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree.
# Copyright (c) Meta Platforms, Inc. and affiliates.
# All rights reserved.
#
# This source code is licensed under the terms described in the LICENSE file in
# top-level folder for each specific model found within the models/ directory at
# the top-level of this source tree.
import textwrap
from pathlib import Path
from llama_models.datatypes import (
RawMediaItem,
RawMessage,
RawTextItem,
)
from ..prompt_format import (
TextCompletionContent,
UseCase,
llama3_1_builtin_tool_call_dialog,
# llama3_1_builtin_tool_call_with_image_dialog,
llama3_2_user_assistant_conversation,
)
def usecases():
this_dir = Path(__file__).parent.parent.resolve()
with open(this_dir / "scripts/resources/dog.jpg", "rb") as f:
img = f.read()
return [
llama3_2_user_assistant_conversation(),
UseCase(
title="User and assistant conversation with Images",
description="This example shows how to pass and image to the model as part of the messages.",
dialogs=[
[
RawMessage(
role="user",
content=[
RawMediaItem(data=img),
RawTextItem(text="Describe this image in two sentences"),
],
)
],
],
notes=textwrap.dedent(
"""
- The `<|image|>` tag is used to indicate presence of the image
- The model isn't an early fusion model so doesn't actually translate an image into several tokens. Instead the cross-attention layers take input "on the side" from a vision encoder
![Image](mm-model.png)
- Its important to postion the <|image|> tag appropriately in the prompt. Image will only attend to the subsequent text tokens
- The <|image|> tag is part of the user message body, implying that it should only come after the header `<|start_header_id|>{role}<|end_header_id|>` in the message body
- We recommend using a single image in one prompt
"""
),
),
UseCase(
title="Builtin and Zero Shot Tool Calling",
description=textwrap.dedent(
"""
Llama3.2 vision models follow the same tool calling format as Llama3.1 models when inputs are text only.
Use `Environment: ipython` to enable tools.
Add `Tools: {{tool_name1}},{{tool_name2}}` for each of the builtin tools.
The same builtin tools as Llama3.1 are available,
- code_interpreter (for executing python code)
- brave_search (to search the web)
- wolfram_alpha (for querying wolfram alpha for mathematical questions)
""",
),
dialogs=[llama3_1_builtin_tool_call_dialog()],
notes=textwrap.dedent(
"""
- Note the `<|python_tag|>` before `brave_search` function call.
- The `<|eom_id|>` tag is used to indicate the end of the message.
- Similar to Llama3.1, code_interpreter is not explicitly mentioned but is enabled via `Environment: ipython`.
- Tool Calling does NOT work with images in the prompt as of now.
"""
),
),
# UseCase(
# title="Tool Calling for vision models",
# description=textwrap.dedent(
# """
# While Llama3.2 vision models follow the same tool calling format as Llama3.1 models when inputs are text only,
# they are not able to do tool calling when prompt contains image inputs (along with text).
# The recommended way would be to separate out the image understanding from the tool calling in successive prompts.
# Here is an example of how that could be done,
# """,
# ),
# dialogs=[llama3_1_builtin_tool_call_with_image_dialog()],
# notes=textwrap.dedent(
# """
# - Instead of a single prompt (image understanding + tool call), we split into two prompts to achieve the same result.
# """
# ),
# ),
UseCase(
title="Prompt format for base models",
description=textwrap.dedent(
"""
For base models (Llama3.2-11B-Vision and Llama3.2-90B-Vision), the prompt format for a simple completion is as follows
"""
),
dialogs=[
TextCompletionContent(content="The color of the sky is blue but sometimes it can also be"),
],
notes="- Same as Llama3.1",
),
UseCase(
title="Prompt format for base models with Image",
description=textwrap.dedent(
"""
For base models (Llama3.2-11B-Vision and Llama3.2-90B-Vision), here is an example of how the text completion format looks with an image,
"""
),
dialogs=[
TextCompletionContent(
content=[
RawMediaItem(data=img),
RawTextItem(text="If I had to write a haiku for this one"),
]
),
],
notes="- Note the placement of the special tags <|begin_of_text|> and <|image|>",
),
]