ljinstat/Structured_Data_Random_Features_for_Large-Scale_Kernel_Machines
Kernel machines such as the Support Vector Machine are widely used in solving machine learning problem, since they can approximate any function or decision boundary arbitrary well with enough training data. However, those methods applied on the kernel matrix (Gram matrix) of the data scale poorly with the size of the training dataset. The kernel trick may become intractable to compute as the computation and storage requirements for the kernel trick are exponentially proportional to the number of samples in the dataset. It takes a long time to train a model when training examples have big volume. For some specialized algorithms for linear Support Vector Machines, they operate much more quickly when the dimensionality of data is small because they operate on the covariance matrix rather than the kernel matrix of the training data. This paper we’ve chosen proposes a way to combine the advantages of the linear and nonlinear approaches. This method transformed the training and evaluation of any kernel machine by mapping the input data to a randomized low-dimensional feature space in order to create corresponding opera- tions of a linear machine. Those randomized features are designed to ensure that the inner products of the transformed data are nearly equal to those in the feature space of a user specific shift-invariant kernel. This method gives competitive results with state-of-the-art kernel-based classification and re- gression algorithms. What’s more, random features fix the problem of large scale of training data when computing the kernel matrix. The results have similar or even better testing error.
Jupyter Notebook