mirror of https://github.com/meta-llama/llama-stack.git synced 2025-12-13 22:02:38 +00:00

History

Xi Yan 29c8edb4f6 readme		2024-10-21 09:11:25 -07:00
..
build.yaml	rename	2024-10-18 17:28:26 -07:00
README.md	readme	2024-10-21 09:11:25 -07:00
run.yaml	move distribution/templates to distributions/	2024-10-18 17:21:50 -07:00

README.md

Meta Reference Distribution

The llamastack/distribution-meta-reference-gpu distribution consists of the following provider configurations.

API	Inference	Agents	Memory	Safety	Telemetry
Provider(s)	meta-reference	meta-reference	meta-reference, remote::pgvector, remote::chroma	meta-reference	meta-reference

Start the Distribution (Single Node GPU)

Note

This assumes you have access to GPU to start a TGI server with access to your GPU.

Note

For GPU inference, you need to set these environment variables for specifying local directory containing your model checkpoints, and enable GPU inference to start running docker container.

export LLAMA_CHECKPOINT_DIR=~/.llama

Note

~/.llama should be the path containing downloaded weights of Llama models.

To download and start running a pre-built docker container, you may use the following commands:

docker run -it -p 5000:5000 -v ~/.llama:/root/.llama --gpus=all llamastack/llamastack-local-gpu

Alternative (Build and start distribution locally via conda)

You may checkout the Getting Started for more details on starting up a meta-reference distribution.