Neural Networks

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.

Inheritune: Training Smaller Yet More Attentive Language Models

We present Inheritune, a training method that creates smaller yet equally effective language models by inheriting and fine-tuning early transformer layers from larger models, addressing the issue of lazy layers in deep networks.

An Information Theory Perspective on Variance-Invariance-Covariance Regularization

We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.

Back to Basics: Revisiting Standard Deep Learning Components for Class Imbalance

We show that carefully tuning standard deep learning components can achieve state-of-the-art performance on class-imbalanced datasets without specialized techniques.

To Compress or Not to Compress--Self-Supervised Learning and Information Theory: A Review

We present a comprehensive review of self-supervised learning through the lens of information theory, introducing a unified framework that encompasses existing approaches and highlighting the interplay between compression and information preservation in deep neural networks.