update cli ref doc: llama model template names related; separation of copy-and-pastable commands with their outputs (#34)

This commit is contained in:
Jeff Tang 2024-08-21 20:41:30 -07:00 committed by GitHub
parent 863bb915e1
commit b4af8c0e00
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -3,8 +3,9 @@
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-toolchain` package.
```
$ llama --help
llama --help
```
<pre style="font-family: monospace;">
usage: llama [-h] {download,model,distribution} ...
Welcome to the Llama CLI
@ -14,7 +15,7 @@ options:
subcommands:
{download,model,distribution}
```
</pre>
## Step 1. Get the models
@ -22,13 +23,13 @@ You first need to have models downloaded locally.
To download any model you need the **Model Descriptor**.
This can be obtained by running the command
`llama model list`
You should see a table like this
```
> llama model list
llama model list
```
You should see a table like this:
<pre style="font-family: monospace;">
+---------------------------------------+---------------------------------------------+----------------+----------------------------+
| Model Descriptor | HuggingFace Repo | Context Length | Hardware Requirements |
+---------------------------------------+---------------------------------------------+----------------+----------------------------+
@ -58,15 +59,17 @@ You should see a table like this
+---------------------------------------+---------------------------------------------+----------------+----------------------------+
| Prompt-Guard-86M | meta-llama/Prompt-Guard-86M | 128K | 1 GPU, each >= 1GB VRAM |
+---------------------------------------+---------------------------------------------+----------------+----------------------------+
```
</pre>
To download models, you can use the llama download command.
Here is an example download command to get the 8B/70B Instruct model. You will need META_URL which can be obtained from --
https://llama.meta.com/docs/getting_the_models/meta/
```
llama download --source meta --model-id Meta-Llama3.1-8B-Instruct --meta-url "<META_URL>"
llama download --source meta --model-id Meta-Llama3.1-70B-Instruct --meta-url "<META_URL>"
llama download --source meta --model-id Meta-Llama3.1-8B-Instruct --meta-url <META_URL>
```
```
llama download --source meta --model-id Meta-Llama3.1-70B-Instruct --meta-url <META_URL>
```
You can download from HuggingFace using these commands
@ -74,12 +77,16 @@ Set your environment variable HF_TOKEN or pass in --hf-token to the command to v
You can find your token at https://huggingface.co/settings/tokens
```
llama download --source huggingface --model-id Meta-Llama3.1-8B-Instruct --hf-token <HF_TOKEN>
```
```
llama download --source huggingface --model-id Meta-Llama3.1-70B-Instruct --hf-token <HF_TOKEN>
```
You can also download safety models from HF
```
llama download --source huggingface --model-id Llama-Guard-3-8B --ignore-patterns *original*
```
```
llama download --source huggingface --model-id Prompt-Guard-86M --ignore-patterns *original*
```
@ -87,31 +94,29 @@ llama download --source huggingface --model-id Prompt-Guard-86M --ignore-pattern
The `llama model` command helps you explore the models interface.
```
$ llama model --help
usage: llama model [-h] {template} ...
Describe llama model interfaces
llama model --help
```
<pre style="font-family: monospace;">
usage: llama model [-h] {download,list,template,describe} ...
Work with llama models
options:
-h, --help show this help message and exit
-h, --help show this help message and exit
model_subcommands:
{template}
Example: llama model <subcommand> <options>
```
You can use the describe command to know more about a model
{download,list,template,describe}
</pre>
You can use the describe command to know more about a model:
```
$ llama model describe -m Meta-Llama3.1-8B-Instruct
llama model describe -m Meta-Llama3.1-8B-Instruct
```
<pre style="font-family: monospace;">
+-----------------------------+---------------------------------------+
| Model | Meta-Llama3.1-8B-Instruct |
| Model | Meta- |
| | Llama3.1-8B-Instruct |
+-----------------------------+---------------------------------------+
| HuggingFace ID | meta-llama/Meta-Llama-3.1-8B-Instruct |
+-----------------------------+---------------------------------------+
@ -141,43 +146,77 @@ $ llama model describe -m Meta-Llama3.1-8B-Instruct
| | "top_k": 0 |
| | } |
+-----------------------------+---------------------------------------+
```
</pre>
You can even run `llama model template` see all of the templates and their tokens:
```
$ llama model template
system-message-builtin-and-custom-tools
system-message-builtin-tools-only
system-message-custom-tools-only
system-message-default
assistant-message-builtin-tool-call
assistant-message-custom-tool-call
assistant-message-default
tool-message-failure
tool-message-success
user-message-default
llama model template
```
<pre style="font-family: monospace;">
+-----------+---------------------------------+
| Role | Template Name |
+-----------+---------------------------------+
| user | user-default |
| assistant | assistant-builtin-tool-call |
| assistant | assistant-custom-tool-call |
| assistant | assistant-default |
| system | system-builtin-and-custom-tools |
| system | system-builtin-tools-only |
| system | system-custom-tools-only |
| system | system-default |
| tool | tool-success |
| tool | tool-failure |
+-----------+---------------------------------+
</pre>
And fetch an example by passing it to `--name`:
```
llama model template --name tool-message-success
llama model template --name tool-message-success
<|start_header_id|>ipython<|end_header_id|>
completed
[stdout]{"results":["something something"]}[/stdout]<|eot_id|>
llama model template --name tool-success
```
<pre style="font-family: monospace;">
+----------+----------------------------------------------------------------+
| Name | tool-success |
+----------+----------------------------------------------------------------+
| Template | <|start_header_id|>ipython<|end_header_id|> |
| | |
| | completed |
| | [stdout]{"results":["something |
| | something"]}[/stdout]<|eot_id|> |
| | |
+----------+----------------------------------------------------------------+
| Notes | Note ipython header and [stdout] |
+----------+----------------------------------------------------------------+
</pre>
Or:
```
llama model template --name system-builtin-tools-only
```
<pre style="font-family: monospace;">
+----------+--------------------------------------------+
| Name | system-builtin-tools-only |
+----------+--------------------------------------------+
| Template | <|start_header_id|>system<|end_header_id|> |
| | |
| | Environment: ipython |
| | Tools: brave_search, wolfram_alpha |
| | |
| | Cutting Knowledge Date: December 2023 |
| | Today Date: 21 August 2024 |
| | <|eot_id|> |
| | |
+----------+--------------------------------------------+
| Notes | |
+----------+--------------------------------------------+
</pre>
These commands can help understand the model interface and how prompts / messages are formatted for various scenarios.
#NOTE: Outputs in terminal are color printed to show speacial tokens.
**NOTE**: Outputs in terminal are color printed to show special tokens.
## Step 3: Installing and Configuring Distributions
@ -203,32 +242,33 @@ Lets install, configure and start a distribution to understand more !
Lets start with listing available distributions
```
$ llama distribution list
+---------------+---------------------------------------------+----------------------------------------------------------------------+
| Spec ID | ProviderSpecs | Description |
+---------------+---------------------------------------------+----------------------------------------------------------------------+
| local | { | Use code from `llama_toolchain` itself to serve all llama stack APIs |
| | "inference": "meta-reference", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference" | |
| | } | |
+---------------+---------------------------------------------+----------------------------------------------------------------------+
| remote | { | Point to remote services for all llama stack APIs |
| | "inference": "inference-remote", | |
| | "safety": "safety-remote", | |
| | "agentic_system": "agentic_system-remote" | |
| | } | |
+---------------+---------------------------------------------+----------------------------------------------------------------------+
| local-ollama | { | Like local, but use ollama for running LLM inference |
| | "inference": "meta-ollama", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference" | |
| | } | |
+---------------+---------------------------------------------+----------------------------------------------------------------------+
llama distribution list
```
<pre style="font-family: monospace;">
+--------------+---------------------------------------------+----------------------------------------------------------------------+
| Spec ID | ProviderSpecs | Description |
+--------------+---------------------------------------------+----------------------------------------------------------------------+
| local | { | Use code from `llama_toolchain` itself to serve all llama stack APIs |
| | "inference": "meta-reference", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference" | |
| | } | |
+--------------+---------------------------------------------+----------------------------------------------------------------------+
| remote | { | Point to remote services for all llama stack APIs |
| | "inference": "inference-remote", | |
| | "safety": "safety-remote", | |
| | "agentic_system": "agentic_system-remote" | |
| | } | |
+--------------+---------------------------------------------+----------------------------------------------------------------------+
| local-ollama | { | Like local, but use ollama for running LLM inference |
| | "inference": "meta-ollama", | |
| | "safety": "meta-reference", | |
| | "agentic_system": "meta-reference" | |
| | } | |
+--------------+---------------------------------------------+----------------------------------------------------------------------+
</pre>
As you can see above, each “spec” details the “providers” that make up that spec. For eg. The `local` spec uses the “meta-reference” provider for inference while the `local-ollama` spec relies on a different provider ( ollama ) for inference.
Lets install the fully local implementation of the llama-stack named `local` above.
@ -246,27 +286,25 @@ This will create a new conda environment (name can be passed optionally) and ins
Once it runs successfully , you should see some outputs in the form
```
$ llama distribution install --spec local --name local_llama_8b
....
....
llama distribution install --spec local --name local_llama_8b
```
<pre style="font-family: monospace;">
Successfully installed cfgv-3.4.0 distlib-0.3.8 identify-2.6.0 libcst-1.4.0 llama_toolchain-0.0.2 moreorless-0.4.0 nodeenv-1.9.1 pre-commit-3.8.0 stdlibs-2024.5.15 toml-0.10.2 tomlkit-0.13.0 trailrunner-1.4.0 ufmt-2.7.0 usort-1.0.8 virtualenv-20.26.3
Distribution `local_llama_8b` (with spec local) has been installed successfully!
```
</pre>
Next step is to configure the distribution that you just installed. We provide a simple CLI tool to enable simple configuration.
This command will walk you through the configuration process.
It will ask for some details like model name, paths to models, etc.
NOTE: You will have to download the models if not done already. Follow instructions here on how to download using the llama cli
**NOTE**: You will have to download the models if not done already. Follow instructions here on how to download using the llama cli
```
llama distribution configure --name local_llama_8b
```
Here is an example screenshot of how the cli will guide you to fill the configuration
```
$ llama distribution configure --name local_llama_8b
Here is an example output of how the cli will guide you to fill the configuration:
<pre style="font-family: monospace;">
Configuring API surface: inference
Enter value for model (required): Meta-Llama3.1-8B-Instruct
Enter value for quantization (optional):
@ -279,7 +317,7 @@ Do you want to configure prompt_guard_shield? (y/n): n
Configuring API surface: agentic_system
YAML configuration has been written to ~/.llama/distributions/local0/config.yaml
```
</pre>
As you can see, we did basic configuration above and configured inference to run on model Meta-Llama3.1-8B-Instruct ( obtained from the llama model list command ).
For this initial setup we did not set up safety.
@ -292,11 +330,9 @@ Now lets start the distribution using the cli.
```
llama distribution start --name local_llama_8b --port 5000
```
You should see the distribution start and print the APIs that it is supporting,
```
$ llama distribution start --name local_llama_8b --port 5000
You should see the distribution start and print the APIs that it is supporting:
<pre style="font-family: monospace;">
> initializing model parallel with size 1
> initializing ddp with size 1
> initializing pipeline with size 1
@ -323,7 +359,7 @@ INFO: Started server process [453333]
INFO: Waiting for application startup.
INFO: Application startup complete.
INFO: Uvicorn running on http://[::]:5000 (Press CTRL+C to quit)
```
</pre>
Lets test with a client
@ -336,17 +372,15 @@ python -m llama_toolchain.inference.client localhost 5000
This will run the chat completion client and query the distributions /inference/chat_completion API.
Here is an example output
```
python -m llama_toolchain.inference.client localhost 5000
Here is an example output:
<pre style="font-family: monospace;">
Initializing client for http://localhost:5000
User>hello world, troll me in two-paragraphs about 42
Assistant> You think you're so smart, don't you? You think you can just waltz in here and ask about 42, like it's some kind of trivial matter. Well, let me tell you, 42 is not just a number, it's a way of life. It's the answer to the ultimate question of life, the universe, and everything, according to Douglas Adams' magnum opus, "The Hitchhiker's Guide to the Galaxy". But do you know what's even more interesting about 42? It's that it's not actually the answer to anything, it's just a number that some guy made up to sound profound.
You know what's even more hilarious? People like you who think they can just Google "42" and suddenly become experts on the subject. Newsflash: you're not a supercomputer, you're just a human being with a fragile ego and a penchant for thinking you're smarter than you actually are. 42 is just a number, a meaningless collection of digits that holds no significance whatsoever. So go ahead, keep thinking you're so clever, but deep down, you're just a pawn in the grand game of life, and 42 is just a silly little number that's been used to make you feel like you're part of something bigger than yourself. Ha!
```
</pre>
Similarly you can test safety (if you configured llama-guard and/or prompt-guard shields) by: