TSFLora: Token-Compressed Split Fine-Tuning for Wireless Edge Networks
Abstract
Adapting large AI models (LAMs) to personalized edge data is challenging because wireless devices have limited memory, computation, and uplink capacity. Federated fine-tuning preserves data privacy but still requires each device to host the full model, while split learning reduces device memory at the cost of heavy activation transmission. This paper proposes TSFLora, a token-compressed split fine-tuning framework for communication-efficient LAM adaptation at the edge. TSFLora combines attention-guided token selection, token merging, low-bit activation quantization, and LoRA-based adaptation within a split federated training pipeline. The key idea is to compress the intermediate token sequence before transmission so that the system reduces both uplink traffic and server-side processing without changing the frozen backbone. Experiments on ViT models over CIFAR-10, CIFAR-100, and TinyImageNet show that TSFLora achieves up to 6.8× communication reduction and 41\% memory saving while maintaining competitive accuracy.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.