llama-stack-mirror/distributions/meta-reference-gpu
2024-10-21 09:11:25 -07:00
..
build.yaml rename 2024-10-18 17:28:26 -07:00
README.md readme 2024-10-21 09:11:25 -07:00
run.yaml move distribution/templates to distributions/ 2024-10-18 17:21:50 -07:00

Meta Reference Distribution

The llamastack/distribution-meta-reference-gpu distribution consists of the following provider configurations.

API Inference Agents Memory Safety Telemetry
Provider(s) meta-reference meta-reference meta-reference, remote::pgvector, remote::chroma meta-reference meta-reference

Start the Distribution (Single Node GPU)

Note

This assumes you have access to GPU to start a TGI server with access to your GPU.

Note

For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.

export LLAMA_CHECKPOINT_DIR=~/.llama

Note

~/.llama should be the path containing downloaded weights of Llama models.

To download and start running a pre-built docker container, you may use the following commands:

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu

Alternative (Build and start distribution locally via conda)

  • You may checkout the Getting Started for more details on starting up a meta-reference distribution.