mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-04 18:13:44 +00:00

Composable building blocks to build Llama Apps https://llama-stack.readthedocs.io

Find a file

dltn 142b36c7c5 Add CLI reference doc		2024-07-25 12:40:09 -07:00
docs	Add CLI reference doc	2024-07-25 12:40:09 -07:00
llama_toolchain	Add CLI reference doc	2024-07-25 12:40:09 -07:00
.flake8	Initial commit	2024-07-23 08:32:33 -07:00
.gitignore	Initial commit	2024-07-23 08:32:33 -07:00
.pre-commit-config.yaml	Initial commit	2024-07-23 08:32:33 -07:00
CODE_OF_CONDUCT.md	Initial commit	2024-07-23 08:32:33 -07:00
CONTRIBUTING.md	Initial commit	2024-07-23 08:32:33 -07:00
fp8_requirements.txt	Update fbgemm version (#12 )	2024-07-24 23:48:44 -07:00
LICENSE	Initial commit	2024-07-23 08:32:33 -07:00
MANIFEST.in	add requirements to MANIFEST.in	2024-07-23 12:59:28 -07:00
pyproject.toml	Initial commit	2024-07-23 08:32:33 -07:00
README.md	Add CLI reference doc	2024-07-25 12:40:09 -07:00
requirements.txt	Canonical package name for the dependency	2024-07-23 13:30:33 -07:00
setup.py	Updates to setup and requirements for PyPI	2024-07-23 13:28:30 -07:00

README.md

llama-toolchain

This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.

The Llama Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis.

Installation

You can install this repository as a package with pip install llama-toolchain

If you want to install from source:

mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-toolchain.git

conda create -n toolchain python=3.10
conda activate toolchain

cd llama-toolchain
pip install -e .

Test with cli

We have built a llama cli to make it easy to configure / run parts of the toolchain

llama --help

usage: llama [-h] {download,inference,model,agentic_system} ...

Welcome to the llama CLI

options:
  -h, --help            show this help message and exit

subcommands:
  {download,inference,model,agentic_system}

There are several subcommands to help get you started

Start inference server that can run the llama models

llama inference configure
llama inference start

Test client

python -m llama_toolchain.inference.client localhost 5000

Initializing client for http://localhost:5000
User>hello world, help me out here
Assistant> Hello! I'd be delighted to help you out. What's on your mind? Do you have a question, a problem, or just need someone to chat with? I'm all ears!

Running FP8

You need fbgemm-gpu package which requires torch >= 2.4.0 (currently only in nightly, but releasing shortly...).

ENV=fp8_env
conda create -n $ENV python=3.10
conda activate $ENV

pip3 install -r fp8_requirements.txt