Efficient Training

Inheritune: Training Smaller Yet More Attentive Language Models

We present Inheritune, a training method that creates smaller yet equally effective language models by inheriting and fine-tuning early transformer layers from larger models, addressing the issue of lazy layers in deep networks.