Personally curated corpora (currently news headlines).
News headlines starting approx. 05/2017 from various outlets RSS feeds. May not contain a comprehensive set of all published article over a given period of time, and many outlets are no longer gathered as time went on.
Headlines are provided in 2 directory/file structures, Categorized and Dated. A root directory with the date the corpus was created is common to both formats.
- Categorized contains a single file for each news outlet.
- Dated contains a directory for each news outlet, containing individual files for every individual day there were headlines from that outlet.