Representation Learning

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.

An Information Theory Perspective on Variance-Invariance-Covariance Regularization

We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.