We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.
We introduce LiveBench, a novel contamination-free LLM benchmark featuring continuously updated questions from recent sources, objective scoring, and challenging tasks across multiple domains.
We present OpenDebateEvidence, a massive dataset containing 3.5 million documents from competitive debate, enabling advancement in argument mining and summarization through large language model fine-tuning.