
For exploring the data and documenting its limitations

Primary LanguagePythonMIT LicenseMIT

Exploring the Pile

This repository contains code for exploring the Pile and documenting its limitations

Language Modeling Data Format

The data in the Pile is stored in the lm_dataformat. This repository is designed to be used on data stored in that format. For documentation, see the linked repository.