forked from phoenix-oss/llama-stack-mirror
Mirror to https://github.com/meta-llama/llama-models/pull/324 with some clean up ``` with-proxy pip install -e . export INFERENCE_MODEL=meta-llama/Llama-4-Scout-17B-16E-Instruct export INFERENCE_CHECKPOINT_DIR=../checkpoints/Llama-4-Scout-17B-16E-Instruct export QUANTIZATION_TYPE=int4_mixed with-proxy llama stack build --run --template meta-reference-gpu ``` # What does this PR do? [Provide a short summary of what this PR does and why. Link to relevant issues if applicable.] [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] [//]: # (## Documentation) |
||
|---|---|---|
| .. | ||
| llama3 | ||
| llama3_1 | ||
| llama3_2 | ||
| llama3_3 | ||
| llama4 | ||
| resources | ||
| __init__.py | ||
| checkpoint.py | ||
| datatypes.py | ||
| hadamard_utils.py | ||
| prompt_format.py | ||
| quantize_impls.py | ||
| sku_list.py | ||
| sku_types.py | ||