Exploring the Pile
This repository contains code for exploring the Pile and documenting its limitations
Language Modeling Data Format
The data in the Pile is stored in the lm_dataformat. This repository is designed to be used on data stored in that format. For documentation, see the linked repository.