Natural Language Processing

Turning Up the Heat: Min-p Sampling for Creative and Coherent LLM Outputs

We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.

LiveBench: A Challenging, Contamination-Free LLM Benchmark

We introduce LiveBench, a novel contamination-free LLM benchmark featuring continuously updated questions from recent sources, objective scoring, and challenging tasks across multiple domains.

OpenDebateEvidence: A Massive-Scale Argument Mining and Summarization Dataset

We present OpenDebateEvidence, a massive dataset containing 3.5 million documents from competitive debate, enabling advancement in argument mining and summarization through large language model fine-tuning.