SetupFileCreator problem
Closed this issue · 5 comments
I was giving a look to the setup.xml file created automatically by SetupFileCreator
and found several strange things
THis is the folder structure:
tree
.
|-- 130611_AH0CCVADXX
| |-- report.tsv
| `-- Sample_P567_101
| |-- P567_101_NoIndex_L001_R1_001.fastq.gz
| |-- P567_101_NoIndex_L001_R2_001.fastq.gz
| |-- P567_101_NoIndex_L002_R1_001.fastq.gz
| `-- P567_101_NoIndex_L002_R2_001.fastq.gz
|-- 130612_AH056WADXX
| |-- report.tsv
| `-- Sample_P567_101
| |-- P567_101_GCCAAT_L001_R1_001.fastq.gz
| |-- P567_101_GCCAAT_L001_R2_001.fastq.gz
| |-- P567_101_GCCAAT_L002_R1_001.fastq.gz
| `-- P567_101_GCCAAT_L002_R2_001.fastq.gz
|-- 130627_AH0JYUADXX
| |-- report.tsv
| `-- Sample_P567_102
| |-- P567_102_TGACCA_L001_R1_001.fastq.gz
| |-- P567_102_TGACCA_L001_R2_001.fastq.gz
| |-- P567_102_TGACCA_L002_R1_001.fastq.gz
| `-- P567_102_TGACCA_L002_R2_001.fastq.gz
|-- 130701_AH0J92ADXX
| |-- report.tsv
| `-- Sample_P567_102
| |-- P567_102_TGACCA_L001_R1_001.fastq.gz
| |-- P567_102_TGACCA_L001_R2_001.fastq.gz
| |-- P567_102_TGACCA_L002_R1_001.fastq.gz
| `-- P567_102_TGACCA_L002_R2_001.fastq.gz
|-- 130701_BH0JMGADXX
| |-- report.tsv
| `-- Sample_P567_102
| |-- P567_102_TGACCA_L001_R1_001.fastq.gz
| |-- P567_102_TGACCA_L001_R2_001.fastq.gz
| |-- P567_102_TGACCA_L002_R1_001.fastq.gz
| `-- P567_102_TGACCA_L002_R2_001.fastq.gz
`-- A.Wedell_13_03_UUSNP_setup.xml
This is the command line used to produce A.Wedell_13_03_UUSNP_setup.xml
setupFileCreator --output /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/A.Wedell_13_03_UUSNP_setup.xml --project_name A.Wedell_13_03_UUSNP --sequencing_platform Illumina --sequencing_center NGI --uppnex_project_id a2010002 --reference /proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130611_AH0CCVADXX/Sample_P567_101 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/Sample_P567_101 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102 --input_sample /proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102
I attach the file file at the bottom of the mail.
As expected there are 5 run folders,
but there are 15 sample folders, strangely enough the first run folder as 5 samples folder, the second 4, the third 3... and so on.... it looks like there is some buf in setupFileCreator or in the way in which we specify the folders to the script.
As a further example of what I mean consider that the line
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
is repeated in all the 5 samples runs, while, if I have understand it correctly, it should be present only in the last ... section
This is the xml file that is created, it looks like I cannot attach it to github issue:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns="setup.xml.molmed">
<metadata>
<name>A.Wedell_13_03_UUSNP</name>
<sequenceingcenter>NGI</sequenceingcenter>
<platfrom>Illumina</platfrom>
<uppmaxprojectid>a2010002</uppmaxprojectid>
<uppmaxqos></uppmaxqos>
</metadata>
<inputs>
<runfolder>
<report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130611_AH0CCVADXX/report.tsv</report>
<samplefolder>
<name>P567_101</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130611_AH0CCVADXX/Sample_P567_101</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_101</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/Sample_P567_101</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/report.tsv</report>
<samplefolder>
<name>P567_101</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130612_AH056WADXX/Sample_P567_101</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/report.tsv</report>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_BH0JMGADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/report.tsv</report>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130701_AH0J92ADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/report.tsv</report>
<samplefolder>
<name>P567_102</name>
<path>/proj/a2010002/nobackup/NGI/analysis_ready/ANALYSIS/A.Wedell_13_03_UUSNP/130627_AH0JYUADXX/Sample_P567_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
</inputs>
</project>
Probably same problem but I point it out anyway:
Related to issue #20 : I manually created a UUSNPSEQ dir for project M.Kaller_14_06 (I need to analyse it independently)
tree -L 2
.
|-- 140702_AC41A2ANXX
| |-- report.tsv
| |-- Sample_P1171_102
| |-- P1171_102_ATTCAGAA-CCTATCCT_L001_R1_001.fastq.gz
| |-- P1171_102_ATTCAGAA-CCTATCCT_L001_R2_001.fastq.gz
......
| |-- Sample_P1171_104
| |-- Sample_P1171_106
| `-- Sample_P1171_108
`-- pipelineSetup.xml
and I executed the command:
setupFileCreator -o pipelineSetup.xml -p M.Kaller_14_06 -s Illumina -c NGI -a a2010002 -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_102/ -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_104 -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106 -i /proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108 -r /proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta
the tsv file is the following:
cat 140702_AC41A2ANXX/report.tsv
#SampleName Lane ReadLibrary FlowcellId
P1171_102 1 A AC41A2ANXX
P1171_102 2 A AC41A2ANXX
P1171_102 3 A AC41A2ANXX
P1171_102 4 A AC41A2ANXX
P1171_102 5 A AC41A2ANXX
P1171_102 6 A AC41A2ANXX
P1171_102 7 A AC41A2ANXX
P1171_102 8 A AC41A2ANXX
P1171_104 1 A AC41A2ANXX
P1171_104 2 A AC41A2ANXX
P1171_104 3 A AC41A2ANXX
P1171_104 4 A AC41A2ANXX
P1171_104 5 A AC41A2ANXX
P1171_104 6 A AC41A2ANXX
P1171_104 7 A AC41A2ANXX
P1171_104 8 A AC41A2ANXX
P1171_106 1 A AC41A2ANXX
P1171_106 2 A AC41A2ANXX
P1171_106 3 A AC41A2ANXX
P1171_106 4 A AC41A2ANXX
P1171_106 5 A AC41A2ANXX
P1171_106 6 A AC41A2ANXX
P1171_106 7 A AC41A2ANXX
P1171_106 8 A AC41A2ANXX
P1171_108 1 A AC41A2ANXX
P1171_108 2 A AC41A2ANXX
P1171_108 3 A AC41A2ANXX
P1171_108 4 A AC41A2ANXX
P1171_108 5 A AC41A2ANXX
P1171_108 6 A AC41A2ANXX
P1171_108 7 A AC41A2ANXX
P1171_108 8 A AC41A2ANXX
and the resulting xml file is the one copied at the end of this comment
what makes me suspicious is that there are 4 runfolders the first with 4 sameple folders the second with three sample folders, etc....
@johandahlberg I suppose that only the first run folder entity is the correct one, the other three are not supposed to be there right?
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns="setup.xml.molmed">
<metadata>
<name>M.Kaller_14_06</name>
<sequenceingcenter>NGI</sequenceingcenter>
<platfrom>Illumina</platfrom>
<uppmaxprojectid>a2010002</uppmaxprojectid>
<uppmaxqos></uppmaxqos>
</metadata>
<inputs>
<runfolder>
<report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
<samplefolder>
<name>P1171_102</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_102</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P1171_104</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_104</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P1171_106</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P1171_108</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
<samplefolder>
<name>P1171_104</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_104</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P1171_106</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P1171_108</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
<samplefolder>
<name>P1171_106</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_106</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
<samplefolder>
<name>P1171_108</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
<runfolder>
<runfolder>
<report>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/report.tsv</report>
<samplefolder>
<name>P1171_108</name>
<path>/proj/a2010002/nobackup/vezzi/ANALYSIS/M.Kaller_14_06/140702_AC41A2ANXX/Sample_P1171_108</path>
<reference>/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
</samplefolder>
</runfolder>
</inputs>
</project>
This should be fixed in: 52a3b70. I'll fix some more problems and then push a new release for testing.
@vezzi Check out the latest release and test this and see if that didn't solve your problem.
@vezzi I think that this is fixed now, would you like to confirm that?
yep I close it