We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.
We present Inheritune, a training method that creates smaller yet equally effective language models by inheriting and fine-tuning early transformer layers from larger models, addressing the issue of lazy layers in deep networks.
We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.
We show that carefully tuning standard deep learning components can achieve state-of-the-art performance on class-imbalanced datasets without specialized techniques.
We present a comprehensive review of self-supervised learning through the lens of information theory, introducing a unified framework that encompasses existing approaches and highlighting the interplay between compression and information preservation in deep neural networks.