A personal collection of datasets converted to uniformed formats. They can be used directly by most DMLC projects. The copyrights of these datasets belong to the original authors.
All are converted into the LIBSVM format.
| name | class | +1/-1 | training | testing | feature | feature group | | --- | ----: | ----: | ---: | ---: | ---: | ---: | ---: | | CriteoKaggle | 2 | 3.9:1 | 4.584 × 107 | 6.042 × 106 | 3.429 × 107K | 39 | | CriteoTera | 2 | ? | 2 × 109 | - | 8 × 108 | 39 | | CTRa | 2 | 1:1 | 2.238 × 105 | 6.355 × 104 | 1.314 × 107 | ~200 | | CTRb | 2 | 8.6:1 | 1.645 × 105 | 4.772 × 104 | 1.742 × 107 | ~100 | | Avito | | Avazu |
All are converted into the recordio format
name | class | image size | training | testing |
---|---|---|---|---|
CIFAR10 | 10 | 28 × 28 × 3 | 60,000 | 10,000 |
ILSVRC12 | 1,000 | 227 × 227 × 3 | 1,281,167 | 50,000 |