ConvoKit datasets
upintheairsheep opened this issue · 2 comments
upintheairsheep commented
Can you integrate the ConvoKit datasets, especially the giant Reddit dataset into the pile, or a future version of the pile? I would really would like to bring AI further for all of humanity, not for the purpose of feeding the pigs (cooperations).
https://zissou.infosci.cornell.edu/convokit/datasets/
See https://convokit.cornell.edu/documentation/datasets.html
upintheairsheep commented
http://cairo.lti.cs.cmu.edu/~hector/ - A similar dataset hosting ~0.5GB of Twitter tweets, ~0.3 GB dbpedia data and an unknown amount of wikihow xml files
upintheairsheep commented
pile v2