forked from phoenix-oss/llama-stack-mirror
feat: Add nemo customizer (#1448)
# What does this PR do? This PR adds support for NVIDIA's NeMo Customizer API to the Llama Stack post-training module. The integration enables users to fine-tune models using NVIDIA's cloud-based customization service through a consistent Llama Stack interface. [//]: # (If resolving an issue, uncomment and update the line below) [//]: # (Closes #[issue-number]) ## Test Plan [Describe the tests you ran to verify your changes with result summaries. *Provide clear instructions so the plan can be easily re-executed.*] Yet to be done Things pending under this PR: - [x] Integration of fine-tuned model(new checkpoint) for inference with nvidia llm distribution - [x] distribution integration of API - [x] Add test cases for customizer(In Progress) - [x] Documentation ``` LLAMA_STACK_BASE_URL=http://localhost:5002 pytest -v tests/client-sdk/post_training/test_supervised_fine_tuning.py ============================================================================================================================================================================ test session starts ============================================================================================================================================================================= platform linux -- Python 3.10.0, pytest-8.3.4, pluggy-1.5.0 -- /home/ubuntu/llama-stack/.venv/bin/python cachedir: .pytest_cache metadata: {'Python': '3.10.0', 'Platform': 'Linux-6.8.0-1021-gcp-x86_64-with-glibc2.35', 'Packages': {'pytest': '8.3.4', 'pluggy': '1.5.0'}, 'Plugins': {'nbval': '0.11.0', 'metadata': '3.1.1', 'anyio': '4.8.0', 'html': '4.1.1', 'asyncio': '0.25.3'}} rootdir: /home/ubuntu/llama-stack configfile: pyproject.toml plugins: nbval-0.11.0, metadata-3.1.1, anyio-4.8.0, html-4.1.1, asyncio-0.25.3 asyncio: mode=strict, asyncio_default_fixture_loop_scope=None collected 2 items tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_post_training_provider_registration[txt=8B] PASSED [ 50%] tests/client-sdk/post_training/test_supervised_fine_tuning.py::test_list_training_jobs[txt=8B] PASSED [100%] ======================================================================================================================================================================== 2 passed, 1 warning in 0.10s ======================================================================================================================================================================== ``` cc: @mattf @dglogo @sumitb --------- Co-authored-by: Ubuntu <ubuntu@llama-stack-customizer-dev-inst-2tx95fyisatvlic4we8hidx5tfj.us-central1-a.c.brevdevprod.internal>
This commit is contained in:
parent
ba14552a32
commit
1a73f8305b
26 changed files with 1571 additions and 8 deletions
138
llama_stack/providers/remote/post_training/nvidia/README.md
Normal file
138
llama_stack/providers/remote/post_training/nvidia/README.md
Normal file
|
@ -0,0 +1,138 @@
|
|||
# NVIDIA Post-Training Provider for LlamaStack
|
||||
|
||||
This provider enables fine-tuning of LLMs using NVIDIA's NeMo Customizer service.
|
||||
|
||||
## Features
|
||||
|
||||
- Supervised fine-tuning of Llama models
|
||||
- LoRA fine-tuning support
|
||||
- Job management and status tracking
|
||||
|
||||
## Getting Started
|
||||
|
||||
### Prerequisites
|
||||
|
||||
- LlamaStack with NVIDIA configuration
|
||||
- Access to Hosted NVIDIA NeMo Customizer service
|
||||
- Dataset registered in the Hosted NVIDIA NeMo Customizer service
|
||||
- Base model downloaded and available in the Hosted NVIDIA NeMo Customizer service
|
||||
|
||||
### Setup
|
||||
|
||||
Build the NVIDIA environment:
|
||||
|
||||
```bash
|
||||
llama stack build --template nvidia --image-type conda
|
||||
```
|
||||
|
||||
### Basic Usage using the LlamaStack Python Client
|
||||
|
||||
### Create Customization Job
|
||||
|
||||
#### Initialize the client
|
||||
|
||||
```python
|
||||
import os
|
||||
|
||||
os.environ["NVIDIA_API_KEY"] = "your-api-key"
|
||||
os.environ["NVIDIA_CUSTOMIZER_URL"] = "http://nemo.test"
|
||||
os.environ["NVIDIA_USER_ID"] = "llama-stack-user"
|
||||
os.environ["NVIDIA_DATASET_NAMESPACE"] = "default"
|
||||
os.environ["NVIDIA_PROJECT_ID"] = "test-project"
|
||||
os.environ["NVIDIA_OUTPUT_MODEL_DIR"] = "test-example-model@v1"
|
||||
|
||||
from llama_stack.distribution.library_client import LlamaStackAsLibraryClient
|
||||
|
||||
client = LlamaStackAsLibraryClient("nvidia")
|
||||
client.initialize()
|
||||
```
|
||||
|
||||
#### Configure fine-tuning parameters
|
||||
|
||||
```python
|
||||
from llama_stack_client.types.post_training_supervised_fine_tune_params import (
|
||||
TrainingConfig,
|
||||
TrainingConfigDataConfig,
|
||||
TrainingConfigOptimizerConfig,
|
||||
)
|
||||
from llama_stack_client.types.algorithm_config_param import LoraFinetuningConfig
|
||||
```
|
||||
|
||||
#### Set up LoRA configuration
|
||||
|
||||
```python
|
||||
algorithm_config = LoraFinetuningConfig(type="LoRA", adapter_dim=16)
|
||||
```
|
||||
|
||||
#### Configure training data
|
||||
|
||||
```python
|
||||
data_config = TrainingConfigDataConfig(
|
||||
dataset_id="your-dataset-id", # Use client.datasets.list() to see available datasets
|
||||
batch_size=16,
|
||||
)
|
||||
```
|
||||
|
||||
#### Configure optimizer
|
||||
|
||||
```python
|
||||
optimizer_config = TrainingConfigOptimizerConfig(
|
||||
lr=0.0001,
|
||||
)
|
||||
```
|
||||
|
||||
#### Set up training configuration
|
||||
|
||||
```python
|
||||
training_config = TrainingConfig(
|
||||
n_epochs=2,
|
||||
data_config=data_config,
|
||||
optimizer_config=optimizer_config,
|
||||
)
|
||||
```
|
||||
|
||||
#### Start fine-tuning job
|
||||
|
||||
```python
|
||||
training_job = client.post_training.supervised_fine_tune(
|
||||
job_uuid="unique-job-id",
|
||||
model="meta-llama/Llama-3.1-8B-Instruct",
|
||||
checkpoint_dir="",
|
||||
algorithm_config=algorithm_config,
|
||||
training_config=training_config,
|
||||
logger_config={},
|
||||
hyperparam_search_config={},
|
||||
)
|
||||
```
|
||||
|
||||
### List all jobs
|
||||
|
||||
```python
|
||||
jobs = client.post_training.job.list()
|
||||
```
|
||||
|
||||
### Check job status
|
||||
|
||||
```python
|
||||
job_status = client.post_training.job.status(job_uuid="your-job-id")
|
||||
```
|
||||
|
||||
### Cancel a job
|
||||
|
||||
```python
|
||||
client.post_training.job.cancel(job_uuid="your-job-id")
|
||||
```
|
||||
|
||||
### Inference with the fine-tuned model
|
||||
|
||||
```python
|
||||
response = client.inference.completion(
|
||||
content="Complete the sentence using one word: Roses are red, violets are ",
|
||||
stream=False,
|
||||
model_id="test-example-model@v1",
|
||||
sampling_params={
|
||||
"max_tokens": 50,
|
||||
},
|
||||
)
|
||||
print(response.content)
|
||||
```
|
Loading…
Add table
Add a link
Reference in a new issue