llama-stack

forked from phoenix-oss/llama-stack-mirror

History

Botao Chen 123fb9eb24 feat: [post training] support save hf safetensor format checkpoint (#845 ) ## context Now, in llama stack, we only support inference / eval a finetuned checkpoint with meta-reference as inference provider. This is sub-optimal since meta-reference is pretty slow. Our vision is that developer can inference / eval a finetuned checkpoint produced by post training apis with all the inference providers on the stack. To achieve this, we'd like to define an unified output checkpoint format for post training providers. So that, all the inference provider can respect that format for customized model inference. By spotting check how [ollama](https://github.com/ollama/ollama/blob/main/docs/import.md) and [fireworks](https://docs.fireworks.ai/models/uploading-custom-models) do inference on a customized model, we defined the output checkpoint format as /adapter/adapter_config.json and /adapter/adapter_model.safetensors (as we only support LoRA post training now, we begin from adapter only checkpoint) ## test we kick off a post training job and configured checkpoint format as 'huggingface'. Output files ![Screenshot 2025-02-24 at 11 54 33 PM](https://github.com/user-attachments/assets/fb45a5d7-f288-4d30-82f8-b7a8da2859be) we did a proof of concept with ollama to see if ollama can inference our finetuned checkpoint 1. create Modelfile like <img width="799" alt="Screenshot 2025-01-22 at 5 04 18 PM" src="https://github.com/user-attachments/assets/7fca9ac3-a294-44f8-aab1-83852c600609" /> 2. create a customized model with `ollama create llama_3_2_finetuned` and run inference successfully ![Screenshot 2025-02-24 at 11 55 17 PM](https://github.com/user-attachments/assets/1abe7c52-c6a7-491a-b07c-b7a8e3fd1ddd) This is just a proof of concept with ollama cmd line. As next step, we'd like to wrap loading / inference customized model logic in the inference provider implementation.		2025-02-25 23:29:08 -08:00
..
bedrock	ModelAlias -> ProviderModelEntry	2025-02-20 14:02:36 -08:00
cerebras	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
ci-tests	feat: add (openai, anthropic, gemini) providers via litellm (#1267 )	2025-02-25 22:07:33 -08:00
dell	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
dev	feat: add (openai, anthropic, gemini) providers via litellm (#1267 )	2025-02-25 22:07:33 -08:00
experimental-post-training	feat: [post training] support save hf safetensor format checkpoint (#845 )	2025-02-25 23:29:08 -08:00
fireworks	test: add a ci-tests distro template for running e2e tests (#1237 )	2025-02-24 14:43:21 -08:00
groq	feat: Add Groq distribution template (#1173 )	2025-02-25 14:16:56 -08:00
hf-endpoint	fix!: update eval-tasks -> benchmarks (#1032 )	2025-02-13 16:40:58 -08:00
hf-serverless	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
meta-reference-gpu	fix!: update eval-tasks -> benchmarks (#1032 )	2025-02-13 16:40:58 -08:00
meta-reference-quantized-gpu	fix!: update eval-tasks -> benchmarks (#1032 )	2025-02-13 16:40:58 -08:00
nvidia	test(client-sdk): Update embedding test types to use latest imports (#1203 )	2025-02-21 08:09:17 -08:00
ollama	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
passthrough	feat: inference passthrough provider (#1166 )	2025-02-19 21:47:00 -08:00
remote-vllm	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
sambanova	ModelAlias -> ProviderModelEntry	2025-02-20 14:02:36 -08:00
tgi	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
together	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
vllm-gpu	chore: move embedding deps to RAG tool where they are needed (#1210 )	2025-02-21 11:33:41 -08:00
__init__.py	Auto-generate distro yamls + docs (#468 )	2024-11-18 14:57:06 -08:00
template.py	build: format codebase imports using ruff linter (#1028 )	2025-02-13 10:06:21 -08:00