forked from phoenix-oss/llama-stack-mirror
63 lines
1.5 KiB
Markdown
63 lines
1.5 KiB
Markdown
# llama-toolchain
|
|
|
|
This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.
|
|
The Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis
|
|
|
|
## Installation and Setup ##
|
|
```bash
|
|
mkdir -p ~/local
|
|
cd ~/local
|
|
git clone git@github.com:meta-llama/llama-toolchain.git
|
|
|
|
conda create -n toolchain python=3.10
|
|
conda activate toolchain
|
|
|
|
cd llama-toolchain
|
|
pip install -e .
|
|
```
|
|
|
|
## Test with cli
|
|
We have built a llama cli to make it easy to configure / run parts of the toolchain
|
|
```
|
|
llama --help
|
|
|
|
usage: llama [-h] {download,inference,model,agentic_system} ...
|
|
|
|
Welcome to the LLama cli
|
|
|
|
options:
|
|
-h, --help show this help message and exit
|
|
|
|
subcommands:
|
|
{download,inference,model,agentic_system}
|
|
```
|
|
There are several subcommands to help get you started
|
|
|
|
## Start inference server that can run the llama models
|
|
```bash
|
|
llama inference configure
|
|
llama inference start
|
|
```
|
|
|
|
|
|
## Test client
|
|
```bash
|
|
python -m llama_toolchain.inference.client localhost 5000
|
|
|
|
Initializing client for http://localhost:5000
|
|
User>hello world, help me out here
|
|
Assistant> Hello! I'd be delighted to help you out. What's on your mind? Do you have a question, a problem, or just need someone to chat with? I'm all ears!
|
|
```
|
|
|
|
|
|
## Running FP8
|
|
|
|
You need `fbgemm-gpu` package which requires torch >= 2.4.0 (currently only in nightly, but releasing shortly...).
|
|
|
|
```bash
|
|
ENV=fp8_env
|
|
conda create -n $ENV python=3.10
|
|
conda activate $ENV
|
|
|
|
pip3 install -r fp8_requirements.txt
|
|
```
|