docs: update container deployment guides for distributions

This commit is contained in:
r3v5 2025-07-21 10:33:12 +01:00
parent ecdcfb28ca
commit f009c0b534
No known key found for this signature in database
GPG key ID: 7758B9F272DE67D9
9 changed files with 19 additions and 19 deletions

View file

@ -278,7 +278,7 @@ After this step is successful, you should be able to find the built container im
```
docker run -d \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
localhost/distribution-ollama:dev \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
@ -291,7 +291,7 @@ Here are the docker flags and their uses:
* `-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT`: Maps the container port to the host port for accessing the server
* `-v ~/.llama:/root/.llama`: Mounts the local .llama directory to persist configurations and data
* `-v ~/.llama:/.llama`: Mounts the local .llama directory to persist configurations and data
* `localhost/distribution-ollama:dev`: The name and tag of the container image to run

View file

@ -68,9 +68,9 @@ LLAMA_STACK_PORT=5001
docker run \
-it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-v ./run.yaml:/.llama/my-run.yaml \
llamastack/distribution-watsonx \
--config /root/my-run.yaml \
--config /.llama/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env WATSONX_API_KEY=$WATSONX_API_KEY \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \

View file

@ -65,7 +65,7 @@ registry.dell.huggingface.co/enterprise-dell-inference-meta-llama-meta-llama-3.1
#### Start Llama Stack server pointing to TGI server
```
docker run --pull always --network host -it -p 8321:8321 -v ./run.yaml:/root/my-run.yaml --gpus=all llamastack/distribution-tgi --yaml_config /root/my-run.yaml
docker run --pull always --network host -it -p 8321:8321 -v ./run.yaml:/.llama/my-run.yaml --gpus=all llamastack/distribution-tgi --yaml_config /.llama/my-run.yaml
```
Make sure in you `run.yaml` file, you inference provider is pointing to the correct TGI server endpoint. E.g.

View file

@ -125,7 +125,7 @@ docker run -it \
--pull always \
--network host \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $HOME/.llama:/root/.llama \
-v $HOME/.llama:/.llama \
# NOTE: mount the llama-stack / llama-model directories if testing local changes else not needed
-v /home/hjshah/git/llama-stack:/app/llama-stack-source -v /home/hjshah/git/llama-models:/app/llama-models-source \
# localhost/distribution-dell:dev if building / testing locally
@ -152,10 +152,10 @@ docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $HOME/.llama:/root/.llama \
-v ./llama_stack/templates/tgi/run-with-safety.yaml:/root/my-run.yaml \
-v $HOME/.llama:/.llama \
-v ./llama_stack/templates/tgi/run-with-safety.yaml:/.llama/my-run.yaml \
llamastack/distribution-dell \
--config /root/my-run.yaml \
--config /.llama/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \

View file

@ -83,7 +83,7 @@ docker run \
--pull always \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
@ -97,7 +97,7 @@ docker run \
--pull always \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \

View file

@ -142,9 +142,9 @@ docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-v ./run.yaml:/.llama/my-run.yaml \
llamastack/distribution-nvidia \
--config /root/my-run.yaml \
--config /.llama/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
```

View file

@ -91,7 +91,7 @@ following command:
docker run -it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
@ -112,7 +112,7 @@ Linux users having issues running the above command should instead try the follo
docker run -it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
--network=host \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT \

View file

@ -71,7 +71,7 @@ docker run \
--pull always \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
llamastack/distribution-{{ name }} \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
@ -85,7 +85,7 @@ docker run \
--pull always \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-v ~/.llama:/.llama \
llamastack/distribution-{{ name }} \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \

View file

@ -114,9 +114,9 @@ docker run \
-it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-v ./run.yaml:/.llama/my-run.yaml \
llamastack/distribution-{{ name }} \
--config /root/my-run.yaml \
--config /.llama/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
```