fozziethebeat/S-Space

SVDLIBC generated the incorrect number of dimensions: 3 versus 300

Closed this issue · 2 comments

Hi, I'm getting the above error when running LSAMain with the following commands:
-d data/input2.txt data/output/my_lsa_output.sspace

input2.txt is just a very simple text file (for testing) and it contains:
The man walked the dog.
The man took the dog to the park.
The dog went to the park.

System output:
Saving matrix using edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder@5e2de80c
Saw 8 terms, 7 unique
Saw 5 terms, 5 unique
Saw 6 terms, 6 unique
edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@2fae8f9
edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@3553305b
edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31db processing doc edu.ucla.sspace.util.SparseIntHashArray@390b4f54
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.RuntimeException: SVDLIBC generated the incorrect number of dimensions: 3 versus 300
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.readSVDLIBCsingularVector(SingularValueDecompositionLibC.java:198)
at edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:161)
at edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:463)
at edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:514)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)

FYI the environment setup is : 64-bit Windows 7 , svdlibc compiled with cygwin. Is this issue caused by the input file? I've tried using a wiki dump corpus however the issue still exists. Any help is greatly appreciated.

Thank You

I think the issue is that the input is only three documents but the command
is trying to reduce the dimensionally to 300, which isn't possible (there's
not enough data). If you tried to either reduce to two dimensions or to
increase the number of terms/documents in the input corpus, the command
should work.

Thanks,
David

On Sat, Jan 24, 2015 at 12:46 PM, fingorn notifications@github.com wrote:

Hi, I'm getting the above error when running LSAMain with the following
commands:
-d data/input2.txt data/output/my_lsa_output.sspace

input2.txt is just a very simple text file (for testing) and it contains:
The man walked the dog.
The man took the dog to the park.
The dog went to the park.

System output:
Saving matrix using
edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder@5e2de80
https://github.com/edu.ucla.sspace.matrix.SvdlibcSparseBinaryMatrixBuilder/S-Space/commit/5e2de80c
Saw 8 terms, 7 unique
Saw 5 terms, 5 unique
Saw 6 terms, 6 unique
edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31d
https://github.com/edu.ucla.sspace.lsa.LatentSemanticAnalysis/S-Space/commit/406a31db
processing doc edu.ucla.sspace.util.SparseIntHashArray@2fae8f9
https://github.com/edu.ucla.sspace.util.SparseIntHashArray/S-Space/commit/2fae8f9
edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31d
https://github.com/edu.ucla.sspace.lsa.LatentSemanticAnalysis/S-Space/commit/406a31db
processing doc edu.ucla.sspace.util.SparseIntHashArray@3553305
https://github.com/edu.ucla.sspace.util.SparseIntHashArray/S-Space/commit/3553305b
edu.ucla.sspace.lsa.LatentSemanticAnalysis@406a31d
https://github.com/edu.ucla.sspace.lsa.LatentSemanticAnalysis/S-Space/commit/406a31db
processing doc edu.ucla.sspace.util.SparseIntHashArray@390b4f5
https://github.com/edu.ucla.sspace.util.SparseIntHashArray/S-Space/commit/390b4f54
Jan 25, 2015 1:33:24 AM
edu.ucla.sspace.common.GenericTermDocumentVectorSpace processSpace
INFO: performing log-entropy transform
Jan 25, 2015 1:33:24 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the total row counts
Jan 25, 2015 1:33:24 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Computing the entropy of each row
Jan 25, 2015 1:33:24 AM
edu.ucla.sspace.matrix.LogEntropyTransform$LogEntropyGlobalTransform
INFO: Scaling the entropy of the rows
Jan 25, 2015 1:33:24 AM edu.ucla.sspace.lsa.LatentSemanticAnalysis
processSpace
INFO: reducing to 300 dimensions
Exception in thread "main" java.lang.RuntimeException: SVDLIBC generated
the incorrect number of dimensions: 3 versus 300
at
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.readSVDLIBCsingularVector(SingularValueDecompositionLibC.java:198)
at
edu.ucla.sspace.matrix.factorization.SingularValueDecompositionLibC.factorize(SingularValueDecompositionLibC.java:161)
at
edu.ucla.sspace.lsa.LatentSemanticAnalysis.processSpace(LatentSemanticAnalysis.java:463)
at
edu.ucla.sspace.mains.GenericMain.processDocumentsAndSpace(GenericMain.java:514)
at edu.ucla.sspace.mains.GenericMain.run(GenericMain.java:443)
at edu.ucla.sspace.mains.LSAMain.main(LSAMain.java:167)

FYI the environment setup is : 64-bit Windows 7 , svdlibc compiled with
cygwin. Is this issue caused by the input file? I've tried using a wiki
dump corpus however the issue still exists. Any help is greatly appreciated.

Thank You


Reply to this email directly or view it on GitHub
#58.

Reducing the number of dimensions to 2 solved the issue of the small input corpus. Thank you.