Collaborative Threshold Watermarking
Abstract
In federated learning (FL), K clients jointly train a model without sharing raw data. Because each participant invests data and compute, clients need mechanisms to later prove the provenance of a jointly trained model. Model watermarking embeds a hidden signal in the weights, but naive approaches either do not scale with many clients as per-client watermarks dilute as K grows, or give any individual client the ability to verify and potentially remove the watermark. We introduce (t,K)-threshold watermarking: clients collaboratively embed a shared watermark during training, while only coalitions of at least t clients can reconstruct the watermark key and verify a suspect model. We secret-share the watermark key τ so that coalitions of fewer than t clients cannot reconstruct it, and verification can be performed without revealing τ in the clear. We instantiate our protocol in the white-box setting and evaluate it on image classification tasks on both IID and non-IID partitions, as well as language models fine-tuning setting. Our watermark remains detectable at scale (K=128) with minimal accuracy loss and stays above the detection threshold (z 4) under attacks including adaptive fine-tuning using up to 20% of the training data. Code is available at https://github.com/tameemalaa/collaborative-threshold-watermark.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.