divonlan/genozip

Running `genozip` in parallel

jimmybgammyknee opened this issue · 2 comments

Ive been testing genozip on a group of paired RNA-seq gzipped FASTQ files and its working really well, however I've noticed that using GNU parallel on >2 samples it errors out

Im using a list of samplenames and running those using parallel. Ive tested 2 parallel jobs (-j2) which works fine but (-j4) errors. Im running genozip via its own conda environment on a fairly old version of ubuntu (12.04 i think)

(genozip) ...:~/genozip$ cat test_list1.tsv
/.../storage/raw_fastq/.../RNAseq/.../.../sample1
/.../storage/raw_fastq/.../RNAseq/.../.../sample2
/.../storage/raw_fastq/.../RNAseq/.../.../sample3

(genozip) ...:~/genozip$ parallel -j4 -a test_list1.tsv genozip --md5 {}_1.fastq.gz {}_2.fastq.gz --pair -E ./GRCh37.ref.genozip -o {/}.grch37.genozip
genozip ./GRCh37.ref.genozip : Reading and caching reference hash table...
Error in file_put_data:1375: failed to rename ./GRCh37.ref.genozip.gcache.tmp to ./GRCh37.ref.genozip.gcache: No such file or directory
If this is unexpected, please contact support@genozip.com.
genozip ./GRCh37.ref.genozip : Done

Error in file_put_data:1375: failed to rename ./GRCh37.ref.genozip.gcache.tmp to ./GRCh37.ref.genozip.gcache: No such file or directory
If this is unexpected, please contact support@genozip.com.
genozip ./GRCh37.ref.genozip : Done
genozip ADII-0679-201388_1.fastq.gz : 0%

Im assuming this is due to the cache being unable to be accessed by more than two files at once?

Great thanks @divonlan !