llama-stack-mirror/README.md

This repo contains the API specifications for various parts of the Llama Stack.
The Stack consists of toolchain-apis and agentic-apis.

The tool chain apis that are covered --
- inference / batch inference
- post training
- reward model scoring
- synthetic data generation


## Running FP8

You need `fbgemm-gpu` package which requires torch >= 2.4.0 (currently only in nightly, but releasing shortly...).

```bash
ENV=fp8_env
conda create -n $ENV python=3.10
conda activate $ENV

pip3 install -r fp8_requirements.txt
```


### Generate OpenAPI specs

Set up virtual environment

```
python3 -m venv ~/.venv/toolchain/
source ~/.venv/toolchain/bin/activate

with-proxy pip3 install -r requirements.txt

```

Run the generate.sh script

```
cd source && sh generate.sh
```