Error: file /var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01 was not found! Exiting...
Closed this issue · 9 comments
Hello:
I don't quite understand how to write this configuration file.
Now i have Fast5 files split by barcode.
❯ ls
barcode01 barcode15 barcode29 barcode43 barcode57 barcode71 barcode85
barcode02 barcode16 barcode30 barcode44 barcode58 barcode72 barcode86
barcode03 barcode17 barcode31 barcode45 barcode59 barcode73 barcode87
barcode04 barcode18 barcode32 barcode46 barcode60 barcode74 barcode88
barcode05 barcode19 barcode33 barcode47 barcode61 barcode75 barcode89
barcode06 barcode20 barcode34 barcode48 barcode62 barcode76 barcode90
barcode07 barcode21 barcode35 barcode49 barcode63 barcode77 barcode91
barcode08 barcode22 barcode36 barcode50 barcode64 barcode78 barcode92
barcode09 barcode23 barcode37 barcode51 barcode65 barcode79 barcode93
barcode10 barcode24 barcode38 barcode52 barcode66 barcode80 barcode94
barcode11 barcode25 barcode39 barcode53 barcode67 barcode81 barcode95
barcode12 barcode26 barcode40 barcode54 barcode68 barcode82 barcode96
barcode13 barcode27 barcode41 barcode55 barcode69 barcode83 unclassified
barcode14 barcode28 barcode42 barcode56 barcode70 barcode84
How can I configure the software to run normally.
command line
singularity run /home/guangzhoulab001/nanopanel2_1.01.sif call -c /home/guangzhoulab001/nanopanel2-1.01/config.json -o .
json file
{
"dataset_name": "seegene02-20210722-1", # name of this dataset (will be used in the output file names/tables)
"ref": "/home/guangzhoulab001/GCF_000195955.2_ASM19595v2_genomic.fna", # the amplicon reference sequence
"fast5_dir": "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files
"demultiplex": { # This section is required only for multiplexed datasets.
"BC01": "S01", # Maps the 1st barcode ('BC01') to a sample identified that will be used in the output files
"BC02": "S02",
"BC03": "S03",
"BC04": "S04",
"BC05": "S05",
"BC06": "S06",
"BC07": "S07",
"BC08": "S08"
},
"logfile": "nanopanel2.log", # name of the log file
"consensus": "mean", # used for consensus calculation (only if multiple mappers are configured)
"mappers": { # configured long-read mappers. Supported types are 'minimap2', 'ngmlr' and 'last'.
"mm2" : {
"type": "minimap2"
},
"ngms": {
"type": "ngmlr",
"additional_param": [ "--no-smallinv", "--no-lowqualitysplit", "-k", "10", "--match", "3", "--mismatch", "-3", "--bin-size", "2", "--kmer-skip", "1" ] # additional runtime parameters for ngmlr
},
"last": {
"type": "last"
}
},
"roi_intervals": ["chr:100-1000"], # list of genomic intervals in which variant calling will be done
"truth_vcf": { # only required if truth-set data is available. Links sample identifiers to truth set VCF files.
"S01": "truth_vcf/S01.exp.vcf",
"S02": "truth_vcf/S02.exp.vcf",
"S03": "truth_vcf/S03.exp.vcf",
"S04": "truth_vcf/S04.exp.vcf",
"S05": "truth_vcf/S05.exp.vcf",
"S06": "truth_vcf/S06.exp.vcf",
"S07": "truth_vcf/S07.exp.vcf",
"S08": "truth_vcf/S08.exp.vcf"
},
"threads": 8, # number of CPUs/threads used by np2 and 3rd part tools
"suppress_snv": [], # list of filters; SNV calls filtered by those will not be included in the output VCF (but will still be in the output TSV file)
"suppress_del": ["AF", "DP"],
"suppress_ins": ["AF", "DP"],
"max_h5_cache": 500, # maximum number of cached H5 files. Setting this to a number >= the number of input FAST5 will greatly speed up the pipeline (at the cost of memory)
"exe": { # this section enables users to link to executables for 3rd party tools. Not needed when running via singularity. Supported sections: 'bgzip', 'samtools', 'porechop', 'minimap2', 'ngmlr', 'lastal', 'last-split', 'maf-convert')
"ngmlr": "singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif" # in this example, ngmlr is called via an (external) singularity image
}
}
Dear Trandamere
If your data is demultiplexed already then you would have to run np2 for each barcode individually. So you'd need a config file per barcode and configure the respective fast5 dir.
One way to automate this would be to write a "template" config file with name "config_bcXX.json.TEMPLATE":
{
"dataset_name": "mydataset_bc@BC@",
"fast5_dir": "mydir/barcode@BC@/workspace/",
[ add remaining config here ]
}
and then have a script that replaces @bc@ with the respective barcode (NOTE: this example works for 9 barcodes only):
#!/usr/bin/env bash
for bc in 01 02 03 04 05 06 07 08 09
do
sed "s/@BC@/${bc}/g" config_bcXX.json.TEMPLATE > config_bc${bc}.json
done
If you are on a SLURM cluster you can then submit, e.g., as array job:
#!/usr/bin/env bash
#SBATCH --job-name=np2
#SBATCH --output=n2_%j.out
#SBATCH --mem=64gb
#SBATCH --cpus-per-task=8
#SBATCH --array=1,2,3,4,5,6,7,8,9
set -e
BC=('01' '02' '03' '04' '05' '06' '07' '08' '09')
bc=${BC[$SLURM_ARRAY_TASK_ID]}
singularity run mypath/nanopanel2_1.01.sif call -c config_bc${bc}.json -o .
HTH, BW niko
I tried to test with part of the data, but failed.
command line
❯ singularity run nanopanel2_1.01.sif call -c /home/guangzhoulab-001/nanopanel2-1.01/config.json -o .
INFO: Converting SIF file to temporary sandbox...
WARNING: underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (64) bind mounts
Traceback (most recent call last):
File "/nanopanel2/nanopanel2.py", line 2077, in <module>
nanopanel2_pipeline(config, outdir)
File "/nanopanel2/nanopanel2.py", line 2001, in nanopanel2_pipeline
samples = extract_fastq(config, demux_index, outdir)
File "/nanopanel2/nanopanel2.py", line 487, in extract_fastq
for file in os.listdir(config['fast5_dir']):
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/'
INFO: Cleaning up image...
~ took 6s
❯ ls /var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/
FAQ61681_pass_barcode01_3142826f_0.fast5
json
{
"dataset_name": "barcode01", # name of this dataset (will be used in the output file names/tables)
"ref": "/home/guangzhoulab-001/GCF_000195955.2_ASM19595v2_genomic.fna", # the amplicon reference sequence
"fast5_dir": "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01/", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files
"logfile": "nanopanel2.log", # name of the log file
"consensus": "mean", # used for consensus calculation (only if multiple mappers are configured)
"mappers": { # configured long-read mappers. Supported types are 'minimap2', 'ngmlr' and 'last'.
"mm2" : {
"type": "minimap2"
},
"ngms": {
"type": "ngmlr",
"additional_param": [ "--no-smallinv", "--no-lowqualitysplit", "-k", "10", "--match", "3", "--mismatch", "-3", "--bin-size", "2", "--kmer-skip", "1" ] # additional runtime parameters for ngmlr
},
"last": {
"type": "last"
}
},
"roi_intervals": ["chr:100-1000"], # list of genomic intervals in which variant calling will be done
"threads": 8, # number of CPUs/threads used by np2 and 3rd part tools
"suppress_snv": [], # list of filters; SNV calls filtered by those will not be included in the output VCF (but will still be in the output TSV file)
"suppress_del": ["AF", "DP"],
"suppress_ins": ["AF", "DP"],
"max_h5_cache": 500, # maximum number of cached H5 files. Setting this to a number >= the number of input FAST5 will greatly speed up the pipeline (at the cost of memory)
"exe": { # this section enables users to link to executables for 3rd party tools. Not needed when running via singularity. Supported sections: 'bgzip', 'samtools', 'porechop', 'minimap2', 'ngmlr', 'lastal', 'last-split', 'maf-convert')
"ngmlr": "singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif" # in this example, ngmlr is called via an (external) singularity image
}
}
Hello:
Folder already exists
❯ ls /var/lib/minknow/data/seegene02202107221/seegene02202107221/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/
FAQ61681_pass_barcode01_3142826f_0.fast5
Hi
In the config above (and in the error message from np2) you link to
/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/
i.e., with dashes in the 'seegene' directory names.
maybe this is the problem?
Replace dashes, program still reported an error
❯ singularity run nanopanel2_1.01.sif call -c /home/guangzhoulab-001/nanopanel2-1.01/config.json -o .
INFO: Converting SIF file to temporary sandbox...
WARNING: underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (64) bind mounts
Traceback (most recent call last):
File "/nanopanel2/nanopanel2.py", line 2077, in <module>
nanopanel2_pipeline(config, outdir)
File "/nanopanel2/nanopanel2.py", line 2001, in nanopanel2_pipeline
samples = extract_fastq(config, demux_index, outdir)
File "/nanopanel2/nanopanel2.py", line 487, in extract_fastq
for file in os.listdir(config['fast5_dir']):
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/minknow/data/seegene02_20210722_1/seegene02_20210722_1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/'
INFO: Cleaning up image...
sorry, that's not what I meant.
Above, you show that the path '/var/lib/minknow/data/seegene02202107221/seegene02202107221/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/'
exists (containing one fast5 file).
But in your config file you reference the path
'/var/lib/minknow/data/seegene02-20210722-1/seegene02-20210722-1/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/'
The path exists after the file name is changed
❯ ls /var/lib/minknow/data/seegene02202107221/seegene02202107221/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/
FAQ61681_pass_barcode01_3142826f_0.fast5
The configuration file has been modified at the same time
{
"dataset_name": "barcode01", # name of this dataset (will be used in the output file names/tables)
"ref": "/home/guangzhoulab-001/GCF_000195955.2_ASM19595v2_genomic.fna", # the amplicon reference sequence
"fast5_dir": "/var/lib/minknow/data/seegene02202107221/seegene02202107221/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/", # workspace output dir of guppy that contains basecalled FAST5 files
"fastq_dir": "/var/lib/minknow/data/seegene02202107221/seegene02202107221/20210722_1145_MN25814_FAQ61681_404c859e/fastq_pass/barcode01/", # output dir of guppy that contains fastq.gz files; needed by porechop and can be omitted if no demultiplexing is configured
"basecall_grp": "Basecall_1D_001", # the used basecall group identifier in the FAST5 files
"logfile": "nanopanel2.log", # name of the log file
"consensus": "mean", # used for consensus calculation (only if multiple mappers are configured)
"mappers": { # configured long-read mappers. Supported types are 'minimap2', 'ngmlr' and 'last'.
"mm2" : {
"type": "minimap2"
},
"ngms": {
"type": "ngmlr",
"additional_param": [ "--no-smallinv", "--no-lowqualitysplit", "-k", "10", "--match", "3", "--mismatch", "-3", "--bin-size", "2", "--kmer-skip", "1" ] # additional runtime parameters for ngmlr
},
"last": {
"type": "last"
}
},
"roi_intervals": ["chr:100-1000"], # list of genomic intervals in which variant calling will be done
"threads": 8, # number of CPUs/threads used by np2 and 3rd part tools
"suppress_snv": [], # list of filters; SNV calls filtered by those will not be included in the output VCF (but will still be in the output TSV file)
"suppress_del": ["AF", "DP"],
"suppress_ins": ["AF", "DP"],
"max_h5_cache": 500, # maximum number of cached H5 files. Setting this to a number >= the number of input FAST5 will greatly speed up the pipeline (at the cost of memory)
"exe": { # this section enables users to link to executables for 3rd party tools. Not needed when running via singularity. Supported sections: 'bgzip', 'samtools', 'porechop', 'minimap2', 'ngmlr', 'lastal', 'last-split', 'maf-convert')
"ngmlr": "singularity run $SOFTWARE/SIF/ngmlr_0.2.7.sif" # in this example, ngmlr is called via an (external) singularity image
}
}
```
But it's still not working
```
❯ singularity run nanopanel2_1.01.sif call -c /home/guangzhoulab-001/nanopanel2-1.01/config.json -o .
INFO: Converting SIF file to temporary sandbox...
WARNING: underlay of /usr/share/zoneinfo/Etc/UTC required more than 50 (64) bind mounts
Traceback (most recent call last):
File "/nanopanel2/nanopanel2.py", line 2077, in <module>
nanopanel2_pipeline(config, outdir)
File "/nanopanel2/nanopanel2.py", line 2001, in nanopanel2_pipeline
samples = extract_fastq(config, demux_index, outdir)
File "/nanopanel2/nanopanel2.py", line 487, in extract_fastq
for file in os.listdir(config['fast5_dir']):
FileNotFoundError: [Errno 2] No such file or directory: '/var/lib/minknow/data/seegene02202107221/seegene02202107221/20210722_1145_MN25814_FAQ61681_404c859e/fast5_pass/barcode01/'
INFO: Cleaning up image...
```
The error message means that np2 cannot access the configured directory, please refer to the singularity docs how to add user-defined bind paths:
https://sylabs.io/guides/3.0/user-guide/bind_paths_and_mounts.html