forked from phoenix-oss/llama-stack-mirror
Add CLI reference docs (#14)
* Add CLI reference doc * touchups * add helptext for download
This commit is contained in:
parent
b8aa99b034
commit
ec433448f2
3 changed files with 174 additions and 38 deletions
44
README.md
44
README.md
|
@ -1,11 +1,12 @@
|
|||
# llama-toolchain
|
||||
|
||||
This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.
|
||||
The Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis
|
||||
|
||||
The Llama Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis.
|
||||
|
||||
## Installation
|
||||
|
||||
You can install this repository as a [package](https://pypi.org/project/llama-toolchain/) by just doing `pip install llama-toolchain`
|
||||
You can install this repository as a [package](https://pypi.org/project/llama-toolchain/) with `pip install llama-toolchain`
|
||||
|
||||
If you want to install from source:
|
||||
|
||||
|
@ -21,44 +22,13 @@ cd llama-toolchain
|
|||
pip install -e .
|
||||
```
|
||||
|
||||
## Test with cli
|
||||
## The Llama CLI
|
||||
|
||||
We have built a llama cli to make it easy to configure / run parts of the toolchain
|
||||
```
|
||||
llama --help
|
||||
The `llama` CLI makes it easy to configure and run the Llama toolchain. Read the [CLI reference](docs/cli_reference.md) for details.
|
||||
|
||||
usage: llama [-h] {download,inference,model,agentic_system} ...
|
||||
## Appendix: Running FP8
|
||||
|
||||
Welcome to the LLama cli
|
||||
|
||||
options:
|
||||
-h, --help show this help message and exit
|
||||
|
||||
subcommands:
|
||||
{download,inference,model,agentic_system}
|
||||
```
|
||||
There are several subcommands to help get you started
|
||||
|
||||
## Start inference server that can run the llama models
|
||||
```bash
|
||||
llama inference configure
|
||||
llama inference start
|
||||
```
|
||||
|
||||
|
||||
## Test client
|
||||
```bash
|
||||
python -m llama_toolchain.inference.client localhost 5000
|
||||
|
||||
Initializing client for http://localhost:5000
|
||||
User>hello world, help me out here
|
||||
Assistant> Hello! I'd be delighted to help you out. What's on your mind? Do you have a question, a problem, or just need someone to chat with? I'm all ears!
|
||||
```
|
||||
|
||||
|
||||
## Running FP8
|
||||
|
||||
You need `fbgemm-gpu` package which requires torch >= 2.4.0 (currently only in nightly, but releasing shortly...).
|
||||
If you want to run FP8, you need the `fbgemm-gpu` package which requires `torch >= 2.4.0` (currently only in nightly, but releasing shortly...)
|
||||
|
||||
```bash
|
||||
ENV=fp8_env
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue