diff --git a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md index f532ce7ec..d46039318 100644 --- a/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md +++ b/docs/source/distributions/self_hosted_distro/meta-reference-gpu.md @@ -63,7 +63,7 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-meta-reference-gpu \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -75,8 +75,8 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-meta-reference-gpu \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` ### Via Conda @@ -87,7 +87,7 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a llama stack build --template meta-reference-gpu --image-type conda llama stack run distributions/meta-reference-gpu/run.yaml \ --port 5001 \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -95,6 +95,6 @@ If you are using Llama Stack Safety / Shield APIs, use: ```bash llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \ --port 5001 \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` diff --git a/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md b/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md index 23302a3ab..837be744a 100644 --- a/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md +++ b/docs/source/distributions/self_hosted_distro/meta-reference-quantized-gpu.md @@ -33,7 +33,7 @@ Note that you need access to nvidia GPUs to run this distribution. This distribu The following environment variables can be configured: - `LLAMASTACK_PORT`: Port for the Llama Stack distribution server (default: `5001`) -- `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `Llama3.2-3B-Instruct`) +- `INFERENCE_MODEL`: Inference model loaded into the Meta Reference server (default: `meta-llama/Llama-3.2-3B-Instruct`) - `INFERENCE_CHECKPOINT_DIR`: Directory containing the Meta Reference model checkpoint (default: `null`) @@ -63,7 +63,7 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-meta-reference-quantized-gpu \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -75,8 +75,8 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-meta-reference-quantized-gpu \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` ### Via Conda @@ -87,7 +87,7 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a llama stack build --template meta-reference-quantized-gpu --image-type conda llama stack run distributions/meta-reference-quantized-gpu/run.yaml \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -95,6 +95,6 @@ If you are using Llama Stack Safety / Shield APIs, use: ```bash llama stack run distributions/meta-reference-quantized-gpu/run-with-safety.yaml \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` diff --git a/llama_stack/templates/meta-reference-gpu/doc_template.md b/llama_stack/templates/meta-reference-gpu/doc_template.md index 71653cfc1..421812dbc 100644 --- a/llama_stack/templates/meta-reference-gpu/doc_template.md +++ b/llama_stack/templates/meta-reference-gpu/doc_template.md @@ -53,7 +53,7 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-{{ name }} \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -65,8 +65,8 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-{{ name }} \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` ### Via Conda @@ -77,7 +77,7 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a llama stack build --template {{ name }} --image-type conda llama stack run distributions/{{ name }}/run.yaml \ --port 5001 \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -85,6 +85,6 @@ If you are using Llama Stack Safety / Shield APIs, use: ```bash llama stack run distributions/{{ name }}/run-with-safety.yaml \ --port 5001 \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` diff --git a/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md b/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md index 897a5faf7..daa380d20 100644 --- a/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md +++ b/llama_stack/templates/meta-reference-quantized-gpu/doc_template.md @@ -55,7 +55,7 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-{{ name }} \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -67,8 +67,8 @@ docker run \ -v ~/.llama:/root/.llama \ llamastack/distribution-{{ name }} \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` ### Via Conda @@ -79,7 +79,7 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a llama stack build --template {{ name }} --image-type conda llama stack run distributions/{{ name }}/run.yaml \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct ``` If you are using Llama Stack Safety / Shield APIs, use: @@ -87,6 +87,6 @@ If you are using Llama Stack Safety / Shield APIs, use: ```bash llama stack run distributions/{{ name }}/run-with-safety.yaml \ --port $LLAMA_STACK_PORT \ - --env INFERENCE_MODEL=Llama3.2-3B-Instruct \ - --env SAFETY_MODEL=Llama-Guard-3-1B + --env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \ + --env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B ``` diff --git a/llama_stack/templates/meta-reference-quantized-gpu/meta_reference.py b/llama_stack/templates/meta-reference-quantized-gpu/meta_reference.py index 68d84ba67..c460860c5 100644 --- a/llama_stack/templates/meta-reference-quantized-gpu/meta_reference.py +++ b/llama_stack/templates/meta-reference-quantized-gpu/meta_reference.py @@ -84,7 +84,7 @@ def get_distribution_template() -> DistributionTemplate: "Port for the Llama Stack distribution server", ), "INFERENCE_MODEL": ( - "Llama3.2-3B-Instruct", + "meta-llama/Llama-3.2-3B-Instruct", "Inference model loaded into the Meta Reference server", ), "INFERENCE_CHECKPOINT_DIR": (