llama-stack-mirror/toolchain/cli/inference/default_configuration.yaml at d95f5f863d437eaf2f5cae24aeeb1900392c8944 - phoenix-oss/llama-stack-mirror - Git for basel.kvant.cloud

phoenix-oss/llama-stack-mirror

mirror of https://github.com/meta-llama/llama-stack.git synced 2025-10-03 19:57:35 +00:00

Hardik Shah d95f5f863d use default_config file to configure inference

2024-07-21 19:26:11 -07:00

9 lines

270 B

YAML

Raw Blame History

 inference_config:
   impl_type: "inline"
   inline_config:
     checkpoint_type: "pytorch"
     checkpoint_dir: {checkpoint_dir}/
     tokenizer_path: {checkpoint_dir}/tokenizer.model
     model_parallel_size: {model_parallel_size}
     max_seq_len: 2048
     max_batch_size: 1