Update more distribution docs to be simpler and partially codegen'ed

2025-10-04 04:04:14 +00:00 · 2024-11-20 14:44:04 -08:00 · 2024-11-20 14:44:04 -08:00 · 2411a44833
commit 2411a44833
parent e84d4436b5
51 changed files with 1188 additions and 291 deletions
--- a/docs/source/distributions/self_hosted_distro/bedrock.md
+++ b/docs/source/distributions/self_hosted_distro/bedrock.md
@ -6,59 +6,58 @@
 self
 ```

-### Connect to a Llama Stack Bedrock Endpoint
- You may connect to Amazon Bedrock APIs for running LLM inference
+The `llamastack/distribution-bedrock` distribution consists of the following provider configurations:

-The `llamastack/distribution-bedrock` distribution consists of the following provider configurations.
+| API | Provider(s) |
+|-----|-------------|
+| agents | `inline::meta-reference` |
+| inference | `remote::bedrock` |
+| memory | `inline::faiss`, `remote::chromadb`, `remote::pgvector` |
+| safety | `remote::bedrock` |
+| telemetry | `inline::meta-reference` |


-| **API**         	| **Inference** 	| **Agents**     	| **Memory**     	| **Safety**     	| **Telemetry**  	|
-|-----------------	|---------------	|----------------	|----------------	|----------------	|----------------	|
-| **Provider(s)** 	| remote::bedrock | meta-reference 	| meta-reference 	| remote::bedrock | meta-reference 	|
+
+### Environment Variables
+
+The following environment variables can be configured:
+
+- `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`)


-### Docker: Start the Distribution (Single Node CPU)

-> [!NOTE]
-> This assumes you have valid AWS credentials configured with access to Amazon Bedrock.
+### Prerequisite: API Keys

-```
-$ cd distributions/bedrock && docker compose up
+Make sure you have access to a AWS Bedrock API Key. You can get one by visiting [AWS Bedrock](https://aws.amazon.com/bedrock/).
+
+
+## Running Llama Stack with AWS Bedrock
+
+You can do this via Conda (build code) or Docker which has a pre-built image.
+
+### Via Docker
+
+This method allows you to get started quickly without having to build the distribution code.
+
+```bash
+LLAMA_STACK_PORT=5001
+docker run \
+  -it \
+  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
+  llamastack/distribution-bedrock \
+  --port $LLAMA_STACK_PORT \
+  --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
+  --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
+  --env AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN
 ```

-Make sure in your `run.yaml` file, your inference provider is pointing to the correct AWS configuration. E.g.
-```
-inference:
-  - provider_id: bedrock0
-    provider_type: remote::bedrock
-    config:
-      aws_access_key_id: <AWS_ACCESS_KEY_ID>
-      aws_secret_access_key: <AWS_SECRET_ACCESS_KEY>
-      aws_session_token: <AWS_SESSION_TOKEN>
-      region_name: <AWS_REGION>
-```
-
-### Conda llama stack run (Single Node CPU)
+### Via Conda

 ```bash
 llama stack build --template bedrock --image-type conda
-# -- modify run.yaml with valid AWS credentials
-llama stack run ./run.yaml
-```
-
-### (Optional) Update Model Serving Configuration
-
-Use `llama-stack-client models list` to check the available models served by Amazon Bedrock.
-
-```
-$ llama-stack-client models list
-+------------------------------+------------------------------+---------------+------------+
-| identifier                   | llama_model                  | provider_id   | metadata   |
-+==============================+==============================+===============+============+
-| Llama3.1-8B-Instruct         | meta.llama3-1-8b-instruct-v1:0 | bedrock0     | {}         |
-+------------------------------+------------------------------+---------------+------------+
-| Llama3.1-70B-Instruct        | meta.llama3-1-70b-instruct-v1:0 | bedrock0     | {}         |
-+------------------------------+------------------------------+---------------+------------+
-| Llama3.1-405B-Instruct       | meta.llama3-1-405b-instruct-v1:0 | bedrock0     | {}         |
-+------------------------------+------------------------------+---------------+------------+
+llama stack run ./run.yaml \
+  --port $LLAMA_STACK_PORT \
+  --env AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID \
+  --env AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY \
+  --env AWS_SESSION_TOKEN=$AWS_SESSION_TOKEN
 ```
--- a/docs/source/distributions/self_hosted_distro/fireworks.md
+++ b/docs/source/distributions/self_hosted_distro/fireworks.md
@ -58,9 +58,7 @@ LLAMA_STACK_PORT=5001
 docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-fireworks \
-  --yaml-config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env FIREWORKS_API_KEY=$FIREWORKS_API_KEY
 ```
@ -70,6 +68,6 @@ docker run \
 ```bash
 llama stack build --template fireworks --image-type conda
 llama stack run ./run.yaml \
-  --port 5001 \
+  --port $LLAMA_STACK_PORT \
  --env FIREWORKS_API_KEY=$FIREWORKS_API_KEY
 ```
--- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
+++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md
@ -54,9 +54,7 @@ LLAMA_STACK_PORT=5001
 docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-meta-reference-gpu \
-  /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
 ```
@ -67,9 +65,7 @@ If you are using Llama Stack Safety / Shield APIs, use:
 docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run-with-safety.yaml:/root/my-run.yaml \
  llamastack/distribution-meta-reference-gpu \
-  /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
@ -81,7 +77,7 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a

 ```bash
 llama stack build --template meta-reference-gpu --image-type conda
-llama stack run ./run.yaml \
+llama stack run distributions/meta-reference-gpu/run.yaml \
  --port 5001 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
 ```
@ -89,7 +85,7 @@ llama stack run ./run.yaml \
 If you are using Llama Stack Safety / Shield APIs, use:

 ```bash
-llama stack run ./run-with-safety.yaml \
+llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
  --port 5001 \
  --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
  --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
--- a/docs/source/distributions/self_hosted_distro/ollama.md
+++ b/docs/source/distributions/self_hosted_distro/ollama.md
@ -66,9 +66,7 @@ docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
  -v ~/.llama:/root/.llama \
-  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-ollama \
-  --yaml-config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env OLLAMA_URL=http://host.docker.internal:11434
--- a/docs/source/distributions/self_hosted_distro/tgi.md
+++ b/docs/source/distributions/self_hosted_distro/tgi.md
@ -85,9 +85,7 @@ LLAMA_STACK_PORT=5001
 docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-tgi \
-  --yaml-config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env TGI_URL=http://host.docker.internal:$INFERENCE_PORT
@ -116,18 +114,18 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a
 ```bash
 llama stack build --template tgi --image-type conda
 llama stack run ./run.yaml
-  --port 5001
-  --env INFERENCE_MODEL=$INFERENCE_MODEL
+  --port $LLAMA_STACK_PORT \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL \
  --env TGI_URL=http://127.0.0.1:$INFERENCE_PORT
 ```

 If you are using Llama Stack Safety / Shield APIs, use:

 ```bash
-llama stack run ./run-with-safety.yaml
-  --port 5001
-  --env INFERENCE_MODEL=$INFERENCE_MODEL
-  --env TGI_URL=http://127.0.0.1:$INFERENCE_PORT
-  --env SAFETY_MODEL=$SAFETY_MODEL
+llama stack run ./run-with-safety.yaml \
+  --port $LLAMA_STACK_PORT \
+  --env INFERENCE_MODEL=$INFERENCE_MODEL \
+  --env TGI_URL=http://127.0.0.1:$INFERENCE_PORT \
+  --env SAFETY_MODEL=$SAFETY_MODEL \
  --env TGI_SAFETY_URL=http://127.0.0.1:$SAFETY_PORT
 ```
--- a/docs/source/distributions/self_hosted_distro/together.md
+++ b/docs/source/distributions/self_hosted_distro/together.md
@ -57,9 +57,7 @@ LLAMA_STACK_PORT=5001
 docker run \
  -it \
  -p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-  -v ./run.yaml:/root/my-run.yaml \
  llamastack/distribution-together \
-  --yaml-config /root/my-run.yaml \
  --port $LLAMA_STACK_PORT \
  --env TOGETHER_API_KEY=$TOGETHER_API_KEY
 ```
@ -69,6 +67,6 @@ docker run \
 ```bash
 llama stack build --template together --image-type conda
 llama stack run ./run.yaml \
-  --port 5001 \
+  --port $LLAMA_STACK_PORT \
  --env TOGETHER_API_KEY=$TOGETHER_API_KEY
 ```