llama-stack/llama_stack/templates
Botao Chen 123fb9eb24
feat: [post training] support save hf safetensor format checkpoint (#845)
## context

Now, in llama stack, we only support inference / eval a finetuned
checkpoint with meta-reference as inference provider. This is
sub-optimal since meta-reference is pretty slow.

Our vision is that developer can inference / eval a finetuned checkpoint
produced by post training apis with all the inference providers on the
stack. To achieve this, we'd like to define an unified output checkpoint
format for post training providers. So that, all the inference provider
can respect that format for customized model inference.

By spotting check how
[ollama](https://github.com/ollama/ollama/blob/main/docs/import.md) and
[fireworks](https://docs.fireworks.ai/models/uploading-custom-models) do
inference on a customized model, we defined the output checkpoint format
as /adapter/adapter_config.json and /adapter/adapter_model.safetensors
(as we only support LoRA post training now, we begin from adapter only
checkpoint)

## test
we kick off a post training job and configured checkpoint format as
'huggingface'. Output files
![Screenshot 2025-02-24 at 11 54
33 PM](https://github.com/user-attachments/assets/fb45a5d7-f288-4d30-82f8-b7a8da2859be)



we did a proof of concept with ollama to see if ollama can inference our
finetuned checkpoint
1. create Modelfile like 

<img width="799" alt="Screenshot 2025-01-22 at 5 04 18 PM"
src="https://github.com/user-attachments/assets/7fca9ac3-a294-44f8-aab1-83852c600609"
/>

2. create a customized model with `ollama create llama_3_2_finetuned`
and run inference successfully

![Screenshot 2025-02-24 at 11 55
17 PM](https://github.com/user-attachments/assets/1abe7c52-c6a7-491a-b07c-b7a8e3fd1ddd)


This is just a proof of concept with ollama cmd line. As next step, we'd
like to wrap loading / inference customized model logic in the inference
provider implementation.
2025-02-25 23:29:08 -08:00
..
bedrock ModelAlias -> ProviderModelEntry 2025-02-20 14:02:36 -08:00
cerebras chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
ci-tests feat: add (openai, anthropic, gemini) providers via litellm (#1267) 2025-02-25 22:07:33 -08:00
dell chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
dev feat: add (openai, anthropic, gemini) providers via litellm (#1267) 2025-02-25 22:07:33 -08:00
experimental-post-training feat: [post training] support save hf safetensor format checkpoint (#845) 2025-02-25 23:29:08 -08:00
fireworks test: add a ci-tests distro template for running e2e tests (#1237) 2025-02-24 14:43:21 -08:00
groq feat: Add Groq distribution template (#1173) 2025-02-25 14:16:56 -08:00
hf-endpoint fix!: update eval-tasks -> benchmarks (#1032) 2025-02-13 16:40:58 -08:00
hf-serverless chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
meta-reference-gpu fix!: update eval-tasks -> benchmarks (#1032) 2025-02-13 16:40:58 -08:00
meta-reference-quantized-gpu fix!: update eval-tasks -> benchmarks (#1032) 2025-02-13 16:40:58 -08:00
nvidia test(client-sdk): Update embedding test types to use latest imports (#1203) 2025-02-21 08:09:17 -08:00
ollama chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
passthrough feat: inference passthrough provider (#1166) 2025-02-19 21:47:00 -08:00
remote-vllm chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
sambanova ModelAlias -> ProviderModelEntry 2025-02-20 14:02:36 -08:00
tgi chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
together chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
vllm-gpu chore: move embedding deps to RAG tool where they are needed (#1210) 2025-02-21 11:33:41 -08:00
__init__.py Auto-generate distro yamls + docs (#468) 2024-11-18 14:57:06 -08:00
template.py build: format codebase imports using ruff linter (#1028) 2025-02-13 10:06:21 -08:00