We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.
We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.
We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.
We show that carefully tuning standard deep learning components can achieve state-of-the-art performance on class-imbalanced datasets without specialized techniques.
We present a comprehensive review of self-supervised learning through the lens of information theory, introducing a unified framework that encompasses existing approaches and highlighting the interplay between compression and information preservation in deep neural networks.