mirror of
https://github.com/meta-llama/llama-stack.git
synced 2025-07-04 05:12:35 +00:00
# What does this PR do? Simple approach to get some provider pages in the docs. Add or update description fields in the provider configuration class using Pydantic’s Field, ensuring these descriptions are clear and complete, as they will be used to auto-generate provider documentation via ./scripts/distro_codegen.py instead of editing the docs manually. Signed-off-by: Sébastien Han <seb@redhat.com>
1.1 KiB
1.1 KiB
inline::meta-reference
Description
Meta's reference implementation of inference with support for various model formats and optimization techniques.
Configuration
Field | Type | Required | Default | Description |
---|---|---|---|---|
model |
str | None |
No | ||
torch_seed |
int | None |
No | ||
max_seq_len |
<class 'int'> |
No | 4096 | |
max_batch_size |
<class 'int'> |
No | 1 | |
model_parallel_size |
int | None |
No | ||
create_distributed_process_group |
<class 'bool'> |
No | True | |
checkpoint_dir |
str | None |
No | ||
quantization |
Bf16QuantizationConfig | Fp8QuantizationConfig | Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type' |
No |
Sample Configuration
model: Llama3.2-3B-Instruct
checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
quantization:
type: ${env.QUANTIZATION_TYPE:=bf16}
model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
max_batch_size: ${env.MAX_BATCH_SIZE:=1}
max_seq_len: ${env.MAX_SEQ_LEN:=4096}