We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.
We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.