BEFT: Bias-Efficient Fine-Tuning of Language Models in Low-Data Regimes
Abstract
Fine-tuning the bias terms of large language models (LLMs) has the potential to achieve unprecedented parameter efficiency while maintaining competitive performance, particularly in low-data regimes. However, the link between fine-tuning different bias terms (i.e., bq, bk, and bv in the query, key, or value projections) and downstream performance remains largely unclear to date. In this paper, we investigate the link between fine-tuning bq, bk, and bv with the performance of the downstream task. Our key finding is that directly fine-tuning bv generally leads to higher downstream performance in low-data regimes, in comparison to bq and bk. We extensively evaluate this unique property across a wide range of LLMs spanning encoder-only and decoder-only architectures up to 6.7B parameters (including bias-free LLMs). Our results provide strong evidence for the effectiveness of directly fine-tuning bv across various downstream tasks. The implementation code is available at https://github.com/whubaichuan/BEFT.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.