/mvdata

Data sets for multi-view learning.

Primary LanguageShellApache License 2.0Apache-2.0

mvdata

Data sets for multi-view learning in MatLab ".mat" file format.

##Introduction

This repository contains the data that is used in the following paper.

Li, Yeqing, Feiping Nie, Heng Huang, and Junzhou Huang. "Large-Scale Multi-View Spectral Clustering via Bipartite Graph." In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

If you use the data, please cite our paper:

@inproceedings{li2015large,
  title={Large-Scale Multi-View Spectral Clustering via Bipartite Graph},
  author={Li, Yeqing and Nie, Feiping and Huang, Heng and Huang, Junzhou},
  booktitle={Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence},
  year={2015}
}

###How to use

The ".mat" files using the following script.

sh get_data.sh

Note: The whole dataset is about 300MB to 400MB.

###About the data format

All the data is stored in MatLab ".mat" file format. Each ".mat" file contains at least two variables: X and Y. "X" is a cell array of feature matrix (dimension N by d) and "Y" is the label vector (dimension N by 1), where N is the number of data points and d is the dimension of features. The specification of the data is listed below

No. Handwritten Caltech-7/20 Reuters NUS-WIDE AWA
1 Pix(240) Gabor(48) English(21531) CH(65) CQ(2688)
2 Fou(76) WM(40) France(24892) CM(226) LSS(2000)
3 Fac(216) CENTRIST(254) German(34251) CORR(145) PHOG(252)
4 ZER(47) HOG(1984) Italian(15506) EDH(74) SIFT(2000)
5 KAR(64) GIST(512) Spanish(11547) WT(129) RGSIFT(2000)
6 MOR(6) LBP(928) - - SURF(2000)
num of data 2000 1474/2386 18758 26315 4000
num of classes 10 7/20 6 31 50

Note: Caltech-7 and Caltech-20 are two subsets of Caltech-101, which contains only 7 and 20 classes respectively. The creation of these subsets is due to the unbalance of the number of data in each classes of Caltech-101.

List of features:

  • Fourier coefficients of the character shapes (FOU)
  • Profile correlations (FAC)
  • Pixel averages in 2 × 3 windows (Pix),
  • Zernike moment (ZER)
  • Morphological (MOR) features.
  • Gabor feature
  • Wavelet moments (WM)
  • CENTRIST feature
  • Histogram of oriented gradients (HOG) feature
  • GIST feature
  • Local binary patterns (LBP) feature
  • Color Histogram (CH)
  • Color moments (CM)
  • Color correlation (CORR)
  • Edge distribution (EDH)
  • Wavelet texture (WT)

##Source

The source of the data are downloaded from the following links: