We introduce min-p sampling, a dynamic truncation method for language models that improves text generation quality and diversity, especially at high temperatures, showing superior performance across multiple benchmarks.
We present Inheritune, a training method that creates smaller yet equally effective language models by inheriting and fine-tuning early transformer layers from larger models, addressing the issue of lazy layers in deep networks.