forked from phoenix-oss/llama-stack-mirror

History

Botao Chen 123fb9eb24 feat: [post training] support save hf safetensor format checkpoint (#845 ) ## context Now, in llama stack, we only support inference / eval a finetuned checkpoint with meta-reference as inference provider. This is sub-optimal since meta-reference is pretty slow. Our vision is that developer can inference / eval a finetuned checkpoint produced by post training apis with all the inference providers on the stack. To achieve this, we'd like to define an unified output checkpoint format for post training providers. So that, all the inference provider can respect that format for customized model inference. By spotting check how [ollama](https://github.com/ollama/ollama/blob/main/docs/import.md) and [fireworks](https://docs.fireworks.ai/models/uploading-custom-models) do inference on a customized model, we defined the output checkpoint format as /adapter/adapter_config.json and /adapter/adapter_model.safetensors (as we only support LoRA post training now, we begin from adapter only checkpoint) ## test we kick off a post training job and configured checkpoint format as 'huggingface'. Output files ![Screenshot 2025-02-24 at 11 54 33 PM](https://github.com/user-attachments/assets/fb45a5d7-f288-4d30-82f8-b7a8da2859be) we did a proof of concept with ollama to see if ollama can inference our finetuned checkpoint 1. create Modelfile like <img width="799" alt="Screenshot 2025-01-22 at 5 04 18 PM" src="https://github.com/user-attachments/assets/7fca9ac3-a294-44f8-aab1-83852c600609" /> 2. create a customized model with `ollama create llama_3_2_finetuned` and run inference successfully ![Screenshot 2025-02-24 at 11 55 17 PM](https://github.com/user-attachments/assets/1abe7c52-c6a7-491a-b07c-b7a8e3fd1ddd) This is just a proof of concept with ollama cmd line. As next step, we'd like to wrap loading / inference customized model logic in the inference provider implementation.		2025-02-25 23:29:08 -08:00
..
_static	feat: tool outputs metadata (#1155 )	2025-02-21 13:15:31 -08:00
notebooks	feat: [post training] support save hf safetensor format checkpoint (#845 )	2025-02-25 23:29:08 -08:00
openapi_generator	fix: some telemetry APIs don't currently work (#1188 )	2025-02-20 14:09:25 -08:00
resources	Several documentation fixes and fix link to API reference	2025-02-04 14:00:43 -08:00
source	feat: Add Groq distribution template (#1173 )	2025-02-25 14:16:56 -08:00
zero_to_hero_guide	chore: update the zero_to_hero_guide doc link (#1220 )	2025-02-25 17:16:02 -08:00
conftest.py	No spaces in ipynb tests	2025-02-07 11:56:22 -08:00
contbuild.sh	Fix broken links with docs	2024-11-22 20:42:17 -08:00
dog.jpg	Support for Llama3.2 models and Swift SDK (#98 )	2024-09-25 10:29:58 -07:00
getting_started.ipynb	fix: Update getting_started.ipynb (#1245 )	2025-02-24 18:22:32 -08:00
license_header.txt	Initial commit	2024-07-23 08:32:33 -07:00
make.bat	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
Makefile	first version of readthedocs (#278 )	2024-10-22 10:15:58 +05:30
readme.md	Fix README.md notebook links (#976 )	2025-02-05 14:33:46 -08:00
requirements.txt	Pin sphinx	2025-02-19 20:20:46 -08:00

readme.md

Llama Stack Documentation

Here's a collection of comprehensive guides, examples, and resources for building AI applications with Llama Stack. For the complete documentation, visit our ReadTheDocs page.

Content

Try out Llama Stack's capabilities through our detailed Jupyter notebooks:

Building AI Applications Notebook - A comprehensive guide to building production-ready AI applications using Llama Stack
Benchmark Evaluations Notebook - Detailed performance evaluations and benchmarking results
Zero-to-Hero Guide - Step-by-step guide for getting started with Llama Stack