Free up memory after post training finishes (#770)

## context 
Currently, the GPU memory will be continuously occupied after the
training finishes. In this PR, we explicitly delete the reference and
clean up the memory after training finishes.

## test
Before the change, after training a llama 3.2 3B model, >6GB GPU memory
is still occupied

After the change, after training a llama 3.2 3B model, the GPU memory
drops to ~1GB

<img width="156" alt="Screenshot 2025-01-14 at 6 05 17 PM"
src="https://github.com/user-attachments/assets/45d212b1-a651-49f3-aad9-1c0a27fcebcf"
/>
This commit is contained in:
Botao Chen 2025-01-14 19:19:38 -08:00 committed by GitHub
parent b2b82d4a90
commit 52a21ce78f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -4,6 +4,7 @@
# This source code is licensed under the terms described in the LICENSE file in # This source code is licensed under the terms described in the LICENSE file in
# the root directory of this source tree. # the root directory of this source tree.
import gc
import logging import logging
import os import os
import time import time
@ -580,6 +581,12 @@ class LoraFinetuningSingleDevice:
checkpoint.training_metrics = training_metrics checkpoint.training_metrics = training_metrics
checkpoints.append(checkpoint) checkpoints.append(checkpoint)
# clean up the memory after training finishes
self._model.to("cpu")
del self._model
gc.collect()
torch.cuda.empty_cache()
return (memory_stats, checkpoints) return (memory_stats, checkpoints)
async def validation(self) -> Tuple[float, float]: async def validation(self) -> Tuple[float, float]: