ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

Abstract

This work presents ChunkFT, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. ChunkFT enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of ChunkFT in the deterministic setting. Empirically, we apply ChunkFT to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2× H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of ChunkFT in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that ChunkFT consistently outperforms existing memory-efficient baselines. Notably, ChunkFT achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…