llama-stack-mirror/README.md
2024-07-25 12:40:09 -07:00

70 lines
1.7 KiB
Markdown

# llama-toolchain
This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.
The Llama Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis.
## Installation
You can install this repository as a [package](https://pypi.org/project/llama-toolchain/) with `pip install llama-toolchain`
If you want to install from source:
```bash
mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-toolchain.git
conda create -n toolchain python=3.10
conda activate toolchain
cd llama-toolchain
pip install -e .
```
## Test with cli
We have built a llama cli to make it easy to configure / run parts of the toolchain
```
llama --help
usage: llama [-h] {download,inference,model,agentic_system} ...
Welcome to the llama CLI
options:
-h, --help show this help message and exit
subcommands:
{download,inference,model,agentic_system}
```
There are several subcommands to help get you started
## Start inference server that can run the llama models
```bash
llama inference configure
llama inference start
```
## Test client
```bash
python -m llama_toolchain.inference.client localhost 5000
Initializing client for http://localhost:5000
User>hello world, help me out here
Assistant> Hello! I'd be delighted to help you out. What's on your mind? Do you have a question, a problem, or just need someone to chat with? I'm all ears!
```
## Running FP8
You need `fbgemm-gpu` package which requires torch >= 2.4.0 (currently only in nightly, but releasing shortly...).
```bash
ENV=fp8_env
conda create -n $ENV python=3.10
conda activate $ENV
pip3 install -r fp8_requirements.txt
```