event

ML@GT Seminar Series | Lessons from Pre-training Llama 3

Primary tabs

Featuring Mike Lewis, Facebook AI Research

Abstract: Large language models have revolutionized artificial intelligence, but many details of their creation remain shrouded in mystery due to their cost and commercial value. I will describe the pre-training of Llama 3, a highly competitive open model. Research in pre-training is challenging, due to the need to accurately make many decisions based on ablations at a scale orders of magnitude below the final model size. However, this project demonstrates that a state-of-the-art model can be created with a surprisingly simple recipe, based around carefully optimizing data curation, building efficient infrastructure, and minimizing complexity elsewhere. I will also contrast life in a large pre-training research team with more academic projects, and discuss outstanding research questions in the field.

Bio: Mike Lewis is a research scientist at Meta, currently leading pre-training research for the Llama models. Research interests include pre-training language models (e.g. Llama 3, Bart and Roberta), retrieval augmentation (e.g. kNN-LM and RAG) and negotiation dialogue agents (such as the Cicero Diplomacy model). Previously he was a postdoc at the University of Washington (working with Luke Zettlemoyer), and  has a PhD from the University of Edinburgh (advised by Mark Steedman). He received a Best Paper Award at EMNLP 2016, Best Resource Paper at ACL 2017, and Best Paper Honourable Mention at ACL 2018. His work has been extensively covered in the media, with varying levels of accuracy.

Groups

Status

  • Workflow Status:Published
  • Created By:shatcher8
  • Created:09/17/2024
  • Modified By:shatcher8
  • Modified:11/15/2024