Deep Learning

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.

Seq-VCR: Preventing Collapse in Intermediate Transformer Representations for Enhanced Reasoning

We propose Seq-VCR, a method to prevent representation collapse in Transformer models, significantly improving their performance on complex reasoning tasks without requiring chain-of-thought supervision.

An Information Theory Perspective on Variance-Invariance-Covariance Regularization

We provide an information-theoretic analysis of VICReg, deriving theoretical foundations for deterministic networks and introducing new SSL methods based on these insights.

Back to Basics: Revisiting Standard Deep Learning Components for Class Imbalance

We show that carefully tuning standard deep learning components can achieve state-of-the-art performance on class-imbalanced datasets without specialized techniques.

To Compress or Not to Compress--Self-Supervised Learning and Information Theory: A Review

We present a comprehensive review of self-supervised learning through the lens of information theory, introducing a unified framework that encompasses existing approaches and highlighting the interplay between compression and information preservation in deep neural networks.