mirror of
				https://github.com/meta-llama/llama-stack.git
				synced 2025-10-26 01:12:59 +00:00 
			
		
		
		
	# What does this PR do? Simple approach to get some provider pages in the docs. Add or update description fields in the provider configuration class using Pydantic’s Field, ensuring these descriptions are clear and complete, as they will be used to auto-generate provider documentation via ./scripts/distro_codegen.py instead of editing the docs manually. Signed-off-by: Sébastien Han <seb@redhat.com>
		
			
				
	
	
		
			32 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			32 lines
		
	
	
	
		
			1.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # inline::meta-reference
 | |
| 
 | |
| ## Description
 | |
| 
 | |
| Meta's reference implementation of inference with support for various model formats and optimization techniques.
 | |
| 
 | |
| ## Configuration
 | |
| 
 | |
| | Field | Type | Required | Default | Description |
 | |
| |-------|------|----------|---------|-------------|
 | |
| | `model` | `str \| None` | No |  |  |
 | |
| | `torch_seed` | `int \| None` | No |  |  |
 | |
| | `max_seq_len` | `<class 'int'>` | No | 4096 |  |
 | |
| | `max_batch_size` | `<class 'int'>` | No | 1 |  |
 | |
| | `model_parallel_size` | `int \| None` | No |  |  |
 | |
| | `create_distributed_process_group` | `<class 'bool'>` | No | True |  |
 | |
| | `checkpoint_dir` | `str \| None` | No |  |  |
 | |
| | `quantization` | `Bf16QuantizationConfig \| Fp8QuantizationConfig \| Int4QuantizationConfig, annotation=NoneType, required=True, discriminator='type'` | No |  |  |
 | |
| 
 | |
| ## Sample Configuration
 | |
| 
 | |
| ```yaml
 | |
| model: Llama3.2-3B-Instruct
 | |
| checkpoint_dir: ${env.CHECKPOINT_DIR:=null}
 | |
| quantization:
 | |
|   type: ${env.QUANTIZATION_TYPE:=bf16}
 | |
| model_parallel_size: ${env.MODEL_PARALLEL_SIZE:=0}
 | |
| max_batch_size: ${env.MAX_BATCH_SIZE:=1}
 | |
| max_seq_len: ${env.MAX_SEQ_LEN:=4096}
 | |
| 
 | |
| ```
 | |
| 
 |