forked from phoenix-oss/llama-stack-mirror
Composable building blocks to build Llama Apps
* fix non-streaming api in inference server * unit test for inline inference * Added non-streaming ollama inference impl * add streaming support for ollama inference with tests * addressing comments --------- Co-authored-by: Hardik Shah <hjshah@fb.com> |
||
---|---|---|
docs | ||
llama_toolchain | ||
tests | ||
.flake8 | ||
.gitignore | ||
.pre-commit-config.yaml | ||
CODE_OF_CONDUCT.md | ||
CONTRIBUTING.md | ||
fp8_requirements.txt | ||
LICENSE | ||
MANIFEST.in | ||
pyproject.toml | ||
README.md | ||
requirements.txt | ||
setup.py |
llama-toolchain
This repo contains the API specifications for various components of the Llama Stack as well implementations for some of those APIs like model inference.
The Llama Stack consists of toolchain-apis and agentic-apis. This repo contains the toolchain-apis.
Installation
You can install this repository as a package with pip install llama-toolchain
If you want to install from source:
mkdir -p ~/local
cd ~/local
git clone git@github.com:meta-llama/llama-toolchain.git
conda create -n toolchain python=3.10
conda activate toolchain
cd llama-toolchain
pip install -e .
The Llama CLI
The llama
CLI makes it easy to configure and run the Llama toolchain. Read the CLI reference for details.
Appendix: Running FP8
If you want to run FP8, you need the fbgemm-gpu
package which requires torch >= 2.4.0
(currently only in nightly, but releasing shortly...)
ENV=fp8_env
conda create -n $ENV python=3.10
conda activate $ENV
pip3 install -r fp8_requirements.txt