Bin Latent Transformer (BiLT): A shift-invariant autoencoder for calibration-free spectral unmixing of turbid media
Abstract
The accurate recovery of constituent-level optical properties from integrating sphere measurements is a central analytical challenge in pharmaceutical analysis, food science, and biomedical diagnostics. Neural network autoencoders can extract spectrally resolved absorption and scattering coefficients for each constituent without prior knowledge, but their fully connected encoders bind learned features to absolute wavelength indices, causing accuracy loss under spectrometer calibration drift or hardware exchange. This work introduces the Bin Latent Transformer (BiLT)-Autoencoder, in which the dense encoder is replaced by a cross-attention scanner: 16 learnable probe vectors query a convolutional feature map, aggregating morphological spectral information independently of absolute wavelength position. A physics-constrained linear decoder with enforced absorption/scattering separation and a three-phase curriculum augmentation strategy complete the architecture. On a liquid phantom benchmark (intralipid and two ink absorbers; 496 samples), the model achieves R2 = 0.979 and 0.975 for μa(λ) and μs'(λ), respectively, on held-out test spectra, maintaining R2 > 0.90 for μa and R2 ≈ 0.99 for μs' across the full tested shift range of 10 spectral bands. The model generalises to a simulated spectrometer with a broader instrument line shape (≈24nm FWHM) without retraining, retaining R2 ≈ 0.96 and 0.974 for the two channels. Attention map analysis reveals a physically interpretable two-component probe strategy: sparse anchor probes at absorption-edge wavelengths combined with a diffuse, SNR-driven ensemble at the high-transmittance long-wavelength region, which recruits additional probes dynamically under noise to provide implicit spectral averaging.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.