Duplicate row.names when using loadSQM
Closed this issue · 6 comments
Hi,
I have a project which I recently exported using sqm2zip and I'm trying to load into R, but whether I use the zip file or the original project folder I get this:
Proj1 <- loadSQM("Proj1.zip",tax_mode = "prokfilter",engine = "data.table")
Loading total reads
Loading orfs
table... |
---|
================================================== |
Error in .rowNamesDF<- (x, value = value) : |
duplicate 'row.names' are not allowed |
In addition: Warning message: |
non-unique values when setting 'row.names': ‘megahit_1_1-411’, ‘megahit_1_424-639’, ‘megahit_10_37-309’, ‘megahit_100_25-276’, ‘megahit_100_280-432’, ‘megahit_1000_2-532’, ‘megahit_10000_2-646’, ‘megahit_10000_713-838’, ‘megahit_100000_3-1235’, ‘megahit_1000000_1-378’, ‘megahit_1000001_2-457’, ‘megahit_1000002_2-370’, ‘megahit_1000003_287-439’, ‘megahit_1000004_2-448’, ‘megahit_1000005_1-318’, ‘megahit_1000006_3-536’, ‘megahit_1000007_1-423’, ‘megahit_1000008_2-409’, ‘megahit_1000009_136-345’, ‘megahit_100001_3-626’, ‘megahit_1000010_1-207’, ‘megahit_1000011_1140-1634’, ‘megahit_1000011_254-1132’, ‘megahit_1000011_3-257’, ‘megahit_1000012_1-630’, ‘megahit_1000013_2-673’, ‘megahit_1000014_12-560’, ‘megahit_1000015_114-527’, ‘megahit_1000015_3-113’, ‘megahit_1000016_3-398’, ‘megahit_1000017_1-309’, ‘megahit_1000018_3-308’, ‘megahit_1000019_3-410’, ‘megahit [... truncated] |
Any ideas?
Thanks!
Can you share the zip file with me? I can check
Thanks! Here's a link to the zip file in google drive, hopefully that works! I couldn't think of an easier way.
https://drive.google.com/file/d/1vqtEVuPLnbpq1MeksIrSayr2PiZEJX5y/view?usp=sharing
Ok, somehow all orfs are present 4 times in your table, instead of once...
Each line seems to contain reads for only one sample. An example for one ORF would look like:
Raw.read.count.JI0015.1 Raw.read.count.JI0015.2
megahit_1_1-411 3 0
megahit_1_1-411.1 0 2
megahit_1_1-411.2 0 0
megahit_1_1-411.3 0 0
Raw.read.count.JI0015.3 Raw.read.count.JI0015.4
megahit_1_1-411 0 0
megahit_1_1-411.1 0 0
megahit_1_1-411.2 1 0
megahit_1_1-411.3 0 1
Raw.read.count.JI0015.5 Raw.read.count.JI0015.6
megahit_1_1-411 0 0
megahit_1_1-411.1 0 0
megahit_1_1-411.2 0 0
megahit_1_1-411.3 0 0
Other elements of the table (e.g. taxonomic and functional annotation) are identical for the repeated ORFs ( as they should )
This is my first time seeing this, and it seems that the project was run with the latest version...
@SamBrutySci did you do stop and restart this run somehow, or changed the parameters midway?
@jtamames any insight on why this may be happening?
Yes the run was interrupted a couple of times by HPC upgrades taking nodes down! Parameters should have all been consistent when restarting each time however. I just restarted using the --restart flag
Is this fixable with the current run or shall I just re-run from a certain step?
Samples JI0015.5 and JI0015.6 have no counts assigned to any ORF, so I suspect the run got interrupted during the mapping step.
To be safe I would maybe restart from step 10, forcing overwrite.
Thanks so much for your help! Restarting at step 10 forcing overwrite has fixed the issue!