A collection of datasets used in machine learning for density estimation.
If you use any of the datasets, you should cite their original papers.1234
Dataset | type | #vars | #train | #valid | #test | density | abbrv |
---|---|---|---|---|---|---|---|
NLTCS1 | binary | 16 | 16181 | 2157 | 3236 | 0.332 | NLTCS |
MSNBC1 | binary | 17 | 291326 | 38843 | 58265 | 0.166 | msnbc |
KDDCup2k1 | binary | 65 | 180092 | 19907 | 34955 | 0.008 | kdd |
Plants1 | binary | 69 | 17412 | 2321 | 3482 | 0.180 | plants |
Audio1 | binary | 100 | 15000 | 2000 | 3000 | 0.199 | baudio |
Jester1 | binary | 100 | 9000 | 1000 | 4116 | 0.608 | jester |
Netflix1 | binary | 100 | 15000 | 2000 | 3000 | 0.541 | bnetflix |
Accidents2 | binary | 111 | 12758 | 1700 | 2551 | 0.291 | accidents |
Mushrooms4 | binary | 112 | 2000 | 500 | 5624 | 0.187 | mushrooms |
Adult4 | binary | 123 | 5000 | 1414 | 26147 | 0.112 | adult |
Connect 44 | binary | 126 | 16000 | 4000 | 47557 | 0.333 | connect4 |
OCR Letters4 | binary | 128 | 32152 | 10000 | 10000 | 0.220 | ocr_letters |
RCV-14 | binary | 150 | 40000 | 10000 | 150000 | 0.138 | rcv1 |
Retail2 | binary | 135 | 22041 | 2938 | 4408 | 0.024 | tretail |
Pumsb-star2 | binary | 163 | 12262 | 1635 | 2452 | 0.270 | pumsb_star |
DNA2 | binary | 180 | 1600 | 400 | 1186 | 0.253 | dna |
Kosarek2 | binary | 190 | 33375 | 4450 | 6675 | 0.020 | kosarek |
MSWeb1 | binary | 294 | 29441 | 3270 | 5000 | 0.010 | MSWeb |
NIPS4 | binary | 500 | 400 | 100 | 1240 | 0.367 | nips |
Book1 | binary | 500 | 8700 | 1159 | 1739 | 0.016 | book |
EachMovie1 | binary | 500 | 4525 | 1002 | 591 | 0.059 | tmovie |
WebKB1 | binary | 839 | 2803 | 558 | 838 | 0.064 | cwebkb |
Reuters-521 | binary | 889 | 6532 | 1028 | 1540 | 0.036 | cr52 |
20 NewsGroup1 | binary | 910 | 11293 | 3764 | 3764 | 0.049 | c20ng |
Movie reviews3 | binary | 1001 | 1600 | 150 | 250 | 0.140 | moviereview |
BBC2 | binary | 1058 | 1670 | 225 | 330 | 0.078 | bbc |
Voting3 | binary | 1359 | 1214 | 200 | 350 | 0.333 | voting |
Ad2 | binary | 1556 | 2461 | 327 | 491 | 0.008 | ad |
1 Daniel Lowd, Jesse Davis: Learning Markov Network Structure with Decision Trees. ICDM 2010
2 Jan Van Haaren, Jesse Davis: Markov Network Structure Learning: A Randomized Feature Generation Approach. AAAI 2012
3 Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck: Tractable Learning for Complex Probability Queries. NIPS 2015
4 Hugo Larochelle, Iain Murray: The Neural Autoregressive Distribution Estimator. AISTATS 2011