hgascon/pulsar

Add check for matrix dimensions > 0

hgascon opened this issue · 3 comments

To avoid such error, add a check for the size of the matrix that is given to PRISMA as input:

> data = loadPrismaData(capture_dir)
Reading data...
Splitting ngrams...
Calc indices...
Setup matrix...
to check: 2 
Error in intI(i, n = x@Dim[1], dn[[1]], give.dn = FALSE) : 
index larger than maximal 0
Calls: loadPrismaData ... callGeneric -> eval -> eval -> [ -> [ -> subCsp_rows -> intI
Execution halted

If data matrix passed to sparse.cor has no features, i.e. it is a "0 x number of documents" matrix. This will result in a not really meaningful row-index vector toCheck = [1 0] which leads to the error message when executing mat[toCheck, ]. To skip the feature correlation step do, data = loadPrismaData(capture_dir, skipFeatureCorrelation=TRUE) which should at least prevent the error.

Related tammok/PRISMA#2 (comment)
Note that this error is often shown together with "Error during clustering (not enough data?)" as a second message

Changing the code as suggested does not eliminate the "Error during clustering (not enough data?)" (if relevant) and it adds another error;

> dimension = estimateDimension(data)
Error in rep(NA, k * k) : invalid 'times' argument                 <<<< here
Calls: estimateDimension -> prismaDuplicatePCA -> sparsePCA -> sparseCov
Execution halted
Error during clustering (not enough data?)

What are you using as input data?