Written by Alon Albalak
Thu 29 February 2024

The Foundation Model Development Cheatsheet

The Foundation Model Development Cheatsheet

The landscape of foundation models is experiencing a rapid evolution, marked by a burgeoning community of developers and contributors, leading to significant advances in the capabilities of these models. Amidst this incredible progress it is important to ensure that the development of models is done responsibly.

At UCSB NLP, we are committed to fostering the development of foundation models responsibly, openly, collaboratively, and ethically. We recognize that the responsible development of foundation models is essential to ensuring that these models are used to benefit society. In line with our goals, we have actively encouraged and participated in open progress by developing datasets (mmc4), surveys (Data selection survey, Self-correcting LLMs), benchmarks (WikiWhy, FinQA, FETA, SafeText, TabFact, VaTeX), tools to evaluate models (InstructScore, CoCo-CroLa), and models (mgie, NeuPSL).

As a next step in our mission, we aim to lower the barrier to entry for new developers. In this spirit, we are thrilled to introduce the "Foundation Model Development Cheatsheet", a concise guide containing resources, tools, and findings from researchers at UCSB and ten other leading organizations. Written by foundation model developers for foundation model developers, the cheat sheet is designed as a quick-start guide for new developers to familiarize themselves with the essential tools and resources for the responsible development of foundation models across text, vision, and speech modalities. The cheat sheet covers the full lifecycle of model development including: data collection, preprocessing, and documentation; model pretraining and finetuning; environmental impact estimation; assessing risks and harms; as well as model documentation, release, and licensing.

We hope this resource will be useful for new developers and experienced researchers alike, as a way of raising awareness for responsible model development. Explore the paper for more details, and find cheatsheet resources that are relevant for you on the project website. The cheatsheet is a living document and will continue to be updated—anyone is welcome to contribute resources directly here, and will be recognized for contributions.

~Alon