Composable building blocks to build Llama Apps https://llama-stack.readthedocs.io
Find a file
2024-07-22 21:26:03 -07:00
llama_toolchain llama_guard inference fix 2024-07-22 21:26:03 -07:00
.flake8 update batch completion endpoint 2024-07-22 16:08:28 -07:00
.gitignore add llama model subcommand 2024-07-22 11:07:11 -07:00
.pre-commit-config.yaml added pre-commit to toolchain 2024-07-22 16:04:31 -07:00
fp8_requirements.txt cleanup for fp8 and requirements etc 2024-07-20 23:21:55 -07:00
LICENSE added pypi package 2024-07-21 13:43:36 -07:00
MANIFEST.in add yaml to manifest 2024-07-22 13:34:08 -07:00
pyproject.toml added pypi package 2024-07-21 13:43:36 -07:00
README.md update README a bit 2024-07-20 23:26:50 -07:00
requirements.txt requirements 2024-07-22 11:39:42 -07:00
setup.py rename toolchain/ --> llama_toolchain/ 2024-07-21 23:48:38 -07:00

This repo contains the API specifications for various parts of the Llama Stack. The Stack consists of toolchain-apis and agentic-apis.

The tool chain apis that are covered --

  • inference / batch inference
  • post training
  • reward model scoring
  • synthetic data generation

Running FP8

You need fbgemm-gpu package which requires torch >= 2.4.0 (currently only in nightly, but releasing shortly...).

ENV=fp8_env
conda create -n $ENV python=3.10
conda activate $ENV

pip3 install -r fp8_requirements.txt

Generate OpenAPI specs

Set up virtual environment

python3 -m venv ~/.venv/toolchain/ 
source ~/.venv/toolchain/bin/activate

with-proxy pip3 install -r requirements.txt 

Run the generate.sh script

cd source && sh generate.sh