Add CLI reference doc

This commit is contained in:
dltn 2024-07-25 12:37:05 -07:00
parent b8aa99b034
commit 142b36c7c5
3 changed files with 149 additions and 4 deletions

View file

@ -1,11 +1,12 @@
# llama-toolchain # llama-toolchain
This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference. This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.
The Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis
The Llama Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis.
## Installation ## Installation
You can install this repository as a [package](https://pypi.org/project/llama-toolchain/) by just doing `pip install llama-toolchain` You can install this repository as a [package](https://pypi.org/project/llama-toolchain/) with `pip install llama-toolchain`
If you want to install from source: If you want to install from source:
@ -29,7 +30,7 @@ llama --help
usage: llama [-h] {download,inference,model,agentic_system} ... usage: llama [-h] {download,inference,model,agentic_system} ...
Welcome to the LLama cli Welcome to the llama CLI
options: options:
-h, --help show this help message and exit -h, --help show this help message and exit

144
docs/cli_reference.md Normal file
View file

@ -0,0 +1,144 @@
# Llama CLI Reference
The `llama` CLI tool helps you setup and use the Llama toolchain & agentic systems. It should be available on your path after installing the `llama-toolchain` package.
```
$ llama --help
Welcome to the Llama Command Line Interface
Usage: llama [-h] {download,inference,model} ...
Options:
-h, --help Show this help message and exit
Subcommands:
{download,inference,model}
```
## Step 1. Get the models
First, you need models locally. You can get the models from [HuggingFace](https://huggingface.co/meta-llama) or [directly from Meta](https://llama.meta.com/llama-downloads/). The download command streamlines the process.
1. Create and get a Hugging Face access token [here](https://huggingface.co/settings/tokens)
2. Set the `HF_TOKEN` environment variable
```
export HF_TOKEN=YOUR_TOKEN_HERE
llama download meta-llama/Meta-Llama-3.1-70B-Instruct
```
Run `llama download --help` for more information.
## Step 2: Understand the models
The `llama model` command helps you explore the models interface.
```
$ llama model --help
usage: llama model [-h] {template} ...
Describe llama model interfaces
options:
-h, --help show this help message and exit
model_subcommands:
{template}
Example: llama model <subcommand> <options>
```
You can run `llama model template` see all of the templates and their tokens:
```
$ llama model template
system-message-builtin-and-custom-tools
system-message-builtin-tools-only
system-message-custom-tools-only
system-message-default
assistant-message-builtin-tool-call
assistant-message-custom-tool-call
assistant-message-default
tool-message-failure
tool-message-success
user-message-default
```
And fetch an example by passing it to `--template`:
```
llama model template --template tool-message-success
llama model template --template tool-message-success
<|start_header_id|>ipython<|end_header_id|>
completed
[stdout]{"results":["something something"]}[/stdout]<|eot_id|>
```
## Step 3. Start the inference server
Once you have a model, the magic begins with inference. The `llama inference` command can help you configure and launch the Llama Stack inference server.
```
$ llama inference --help
usage: llama inference [-h] {start,configure} ...
Run inference on a llama model
options:
-h, --help show this help message and exit
inference_subcommands:
{start,configure}
Example: llama inference start <options>
```
Run `llama inference configure` to setup your configuration at `~/.llama/configs/inference.yaml`. Youll set up variables like:
* the directory where you stored the models you downloaded from step 1
* the model parallel size (1 for 8B models, 8 for 70B/405B)
Once youve configured the inference server, run `llama inference start`. The model will load into GPU and youll be able to send requests once you see the server ready.
If you want to use a different model, re-run `llama inference configure` to update the model path and llama inference start to start again.
Run `llama inference --help` for more information.
## Step 4. Start the agentic system
The `llama agentic_system` command helps you configure and launch agentic systems. The `llama agentic_system configure` command sets up the configuration file the agentic code expects, and the `llama agentic_system start_app` command streamlines launching.
For example, lets run the included chat app:
```
llama agentic_system configure
llama agentic_system start_app chat
```
For more information run `llama agentic_system --help`.

View file

@ -17,7 +17,7 @@ class LlamaCLIParser:
def __init__(self): def __init__(self):
self.parser = argparse.ArgumentParser( self.parser = argparse.ArgumentParser(
prog="llama", prog="llama",
description="Welcome to the LLama cli", description="Welcome to the Llama CLI",
add_help=True, add_help=True,
) )