/DEBD

A collection of commonly used datasets as benchmarks for density estimation in MaLe

Primary LanguagePython

Density Estimation Benchmark Datasets

A collection of datasets used in machine learning for density estimation.

If you use any of the datasets, you should cite their original papers.1234

Datasets

Dataset type #vars #train #valid #test density abbrv
NLTCS1 binary 16 16181 2157 3236 0.332 NLTCS
MSNBC1 binary 17 291326 38843 58265 0.166 msnbc
KDDCup2k1 binary 65 180092 19907 34955 0.008 kdd
Plants1 binary 69 17412 2321 3482 0.180 plants
Audio1 binary 100 15000 2000 3000 0.199 baudio
Jester1 binary 100 9000 1000 4116 0.608 jester
Netflix1 binary 100 15000 2000 3000 0.541 bnetflix
Accidents2 binary 111 12758 1700 2551 0.291 accidents
Mushrooms4 binary 112 2000 500 5624 0.187 mushrooms
Adult4 binary 123 5000 1414 26147 0.112 adult
Connect 44 binary 126 16000 4000 47557 0.333 connect4
OCR Letters4 binary 128 32152 10000 10000 0.220 ocr_letters
RCV-14 binary 150 40000 10000 150000 0.138 rcv1
Retail2 binary 135 22041 2938 4408 0.024 tretail
Pumsb-star2 binary 163 12262 1635 2452 0.270 pumsb_star
DNA2 binary 180 1600 400 1186 0.253 dna
Kosarek2 binary 190 33375 4450 6675 0.020 kosarek
MSWeb1 binary 294 29441 3270 5000 0.010 MSWeb
NIPS4 binary 500 400 100 1240 0.367 nips
Book1 binary 500 8700 1159 1739 0.016 book
EachMovie1 binary 500 4525 1002 591 0.059 tmovie
WebKB1 binary 839 2803 558 838 0.064 cwebkb
Reuters-521 binary 889 6532 1028 1540 0.036 cr52
20 NewsGroup1 binary 910 11293 3764 3764 0.049 c20ng
Movie reviews3 binary 1001 1600 150 250 0.140 moviereview
BBC2 binary 1058 1670 225 330 0.078 bbc
Voting3 binary 1359 1214 200 350 0.333 voting
Ad2 binary 1556 2461 327 491 0.008 ad

Introduced in:

1 Daniel Lowd, Jesse Davis: Learning Markov Network Structure with Decision Trees. ICDM 2010

2 Jan Van Haaren, Jesse Davis: Markov Network Structure Learning: A Randomized Feature Generation Approach. AAAI 2012

3 Jessa Bekker, Jesse Davis, Arthur Choi, Adnan Darwiche, Guy Van den Broeck: Tractable Learning for Complex Probability Queries. NIPS 2015

4 Hugo Larochelle, Iain Murray: The Neural Autoregressive Distribution Estimator. AISTATS 2011