Duplicate row.names when using loadSQM

Question

Duplicate row.names when using loadSQM

Closed this issue 2 months ago · 6 comments

Hi,

I have a project which I recently exported using sqm2zip and I'm trying to load into R, but whether I use the zip file or the original project folder I get this:

Proj1 <- loadSQM("Proj1.zip",tax_mode = "prokfilter",engine = "data.table")
Loading total reads
Loading orfs

table...
==================================================
Error in `.rowNamesDF<-`(x, value = value) :
duplicate 'row.names' are not allowed
In addition: Warning message:
non-unique values when setting 'row.names': ‘megahit_1_1-411’, ‘megahit_1_424-639’, ‘megahit_10_37-309’, ‘megahit_100_25-276’, ‘megahit_100_280-432’, ‘megahit_1000_2-532’, ‘megahit_10000_2-646’, ‘megahit_10000_713-838’, ‘megahit_100000_3-1235’, ‘megahit_1000000_1-378’, ‘megahit_1000001_2-457’, ‘megahit_1000002_2-370’, ‘megahit_1000003_287-439’, ‘megahit_1000004_2-448’, ‘megahit_1000005_1-318’, ‘megahit_1000006_3-536’, ‘megahit_1000007_1-423’, ‘megahit_1000008_2-409’, ‘megahit_1000009_136-345’, ‘megahit_100001_3-626’, ‘megahit_1000010_1-207’, ‘megahit_1000011_1140-1634’, ‘megahit_1000011_254-1132’, ‘megahit_1000011_3-257’, ‘megahit_1000012_1-630’, ‘megahit_1000013_2-673’, ‘megahit_1000014_12-560’, ‘megahit_1000015_114-527’, ‘megahit_1000015_3-113’, ‘megahit_1000016_3-398’, ‘megahit_1000017_1-309’, ‘megahit_1000018_3-308’, ‘megahit_1000019_3-410’, ‘megahit [... truncated]

Any ideas?

Thanks!

Answer 1 · 2024-10-03T09:06:16.000Z

Can you share the zip file with me? I can check

Answer 2 · 2024-10-03T12:33:03.000Z

Thanks! Here's a link to the zip file in google drive, hopefully that works! I couldn't think of an easier way.

https://drive.google.com/file/d/1vqtEVuPLnbpq1MeksIrSayr2PiZEJX5y/view?usp=sharing

Answer 3 · 2024-10-03T14:58:18.000Z

Ok, somehow all orfs are present 4 times in your table, instead of once...
Each line seems to contain reads for only one sample. An example for one ORF would look like:

                  Raw.read.count.JI0015.1 Raw.read.count.JI0015.2
megahit_1_1-411                         3                       0
megahit_1_1-411.1                       0                       2
megahit_1_1-411.2                       0                       0
megahit_1_1-411.3                       0                       0
                  Raw.read.count.JI0015.3 Raw.read.count.JI0015.4
megahit_1_1-411                         0                       0
megahit_1_1-411.1                       0                       0
megahit_1_1-411.2                       1                       0
megahit_1_1-411.3                       0                       1
                  Raw.read.count.JI0015.5 Raw.read.count.JI0015.6
megahit_1_1-411                         0                       0
megahit_1_1-411.1                       0                       0
megahit_1_1-411.2                       0                       0
megahit_1_1-411.3                       0                       0

Other elements of the table (e.g. taxonomic and functional annotation) are identical for the repeated ORFs ( as they should )
This is my first time seeing this, and it seems that the project was run with the latest version...
@SamBrutySci did you do stop and restart this run somehow, or changed the parameters midway?
@jtamames any insight on why this may be happening?

Answer 4 · 2024-10-03T15:01:07.000Z

Yes the run was interrupted a couple of times by HPC upgrades taking nodes down! Parameters should have all been consistent when restarting each time however. I just restarted using the --restart flag

Is this fixable with the current run or shall I just re-run from a certain step?

Answer 5 · 2024-10-03T15:14:57.000Z

Samples JI0015.5 and JI0015.6 have no counts assigned to any ORF, so I suspect the run got interrupted during the mapping step.
To be safe I would maybe restart from step 10, forcing overwrite.

Answer 6 · 2024-10-05T08:17:54.000Z

Thanks so much for your help! Restarting at step 10 forcing overwrite has fixed the issue!