CSMC is a Python library for performing column subset selection in matrix completion tasks. It provides an implementation of the CSSMC method, which aims to complete missing entries in a matrix using a subset of columns.
Columns Selected Matrix Completion (CSMC) is a two-stage approach for low-rank matrix recovery. In the first stage, CSMC samples columns of the input matrix and recovers a smaller column submatrix. In the second stage, it solves a least squares problem to reconstruct the whole matrix.
CSMC supports numpy arrays and pytorch tensors.
You can install CSMC using pip:
pip install -i csmc
- Generate random data
import numpy as np
import random
n_rows = 50
n_cols = 250
rank = 3
x = np.random.default_rng().normal(size=(n_rows, rank))
y = np.random.default_rng().normal(size=(rank, n_cols))
M = np.dot(x, y)
M_incomplete = np.copy(M)
num_missing_elements = int(0.7 * M.size)
indices_to_zero = random.sample(range(M.size), k=num_missing_elements)
rows, cols = np.unravel_index(indices_to_zero, M.shape)
M_incomplete[rows, cols] = np.nan
- Fill with CSNN algorithm
from csmc import CSMC
solver = CSMC(M_incomplete, col_number=100)
M_filled = solver.fit_transform(M_incomplete)
- Fill with Nuclear Norm Minimization with SDP (NN algorithm)
from csmc import NuclearNormMin
solver = NuclearNormMin(M_incomplete)
M_filled = solver.fit_transform(M_incomplete, np.isnan(M_incomplete))
- Fill with Frank-Wolfe (Conditional Gradient Method)
from csmc import CGM
solver = CGM(M_incomplete)
M_filled = solver.fit_transform(M_incomplete, np.isnan(M_incomplete))
NuclearNormMin
: Matrix completion by SDP (NN algorithm) Exact Matrix Completion via Convex OptimizationCSNN
: Matrix completion by CSNNPGD
: Nuclear norm minimization using Proximal Gradient Descent (PGD) Spectral Regularization Algorithms for Learning Large Incomplete Matrices by Mazumder et. al.CSPGD
: Matrix completion by CSPGDCGM
: Matrix completion with Frank-Wolfe method
To adjust the number of threads used for intraop parallelism on CPU, modify variable:
NUM_THREADS = 8
in settings.py
Krajewska, A., Niewiadomska-Szynkiewicz E. (2023). Matrix Completion with Column Subset Selection.
Krajewska, A. (2023). Efficient matrix completion for data recovery in data-driven IT applications. Systems Research Institute Polish Academy of Sciences.