Running `genozip` in parallel
jimmybgammyknee opened this issue · 2 comments
jimmybgammyknee commented
Ive been testing genozip
on a group of paired RNA-seq gzipped FASTQ files and its working really well, however I've noticed that using GNU parallel on >2 samples it errors out
Im using a list of samplenames and running those using parallel. Ive tested 2 parallel jobs (-j2
) which works fine but (-j4
) errors. Im running genozip via its own conda environment on a fairly old version of ubuntu (12.04 i think)
(genozip) ...:~/genozip$ cat test_list1.tsv
/.../storage/raw_fastq/.../RNAseq/.../.../sample1
/.../storage/raw_fastq/.../RNAseq/.../.../sample2
/.../storage/raw_fastq/.../RNAseq/.../.../sample3
(genozip) ...:~/genozip$ parallel -j4 -a test_list1.tsv genozip --md5 {}_1.fastq.gz {}_2.fastq.gz --pair -E ./GRCh37.ref.genozip -o {/}.grch37.genozip
genozip ./GRCh37.ref.genozip : Reading and caching reference hash table...
Error in file_put_data:1375: failed to rename ./GRCh37.ref.genozip.gcache.tmp to ./GRCh37.ref.genozip.gcache: No such file or directory
If this is unexpected, please contact support@genozip.com.
genozip ./GRCh37.ref.genozip : Done
Error in file_put_data:1375: failed to rename ./GRCh37.ref.genozip.gcache.tmp to ./GRCh37.ref.genozip.gcache: No such file or directory
If this is unexpected, please contact support@genozip.com.
genozip ./GRCh37.ref.genozip : Done
genozip ADII-0679-201388_1.fastq.gz : 0%
Im assuming this is due to the cache being unable to be accessed by more than two files at once?
divonlan commented
Hi Jimmy,
The issue is that upon first run with a new reference file, Genozip
generates a cache to accelerate subsequent runs with the same reference. In
the process of generating the cache, it uses a temporary file. What
happened here is that multiple instances of genozip were all trying to
create the cache with the same temporary file name. I will fix the code to
avoid this issue, but as a workaround, after creating a new reference file
- run genozip once manually to generate the cache, and then parallel should
work fine.
…On Thu, Mar 17, 2022 at 10:41 AM Jimmy Breen ***@***.***> wrote:
Ive been testing genozip on a group of paired RNA-seq gzipped FASTQ files
and its working really well, however I've noticed that using GNU parallel
on >2 samples it errors out
Im using a list of samplenames and running those using parallel. Ive
tested 2 parallel jobs (-j2) which works fine but (-j4) errors. Im
running genozip via its own conda environment on a fairly old version of
ubuntu (12.04 i think)
(genozip) ...:~/genozip$ cat test_list1.tsv
/.../storage/raw_fastq/.../RNAseq/.../.../sample1
/.../storage/raw_fastq/.../RNAseq/.../.../sample2
/.../storage/raw_fastq/.../RNAseq/.../.../sample3
(genozip) ...:~/genozip$ parallel -j4 -a test_list1.tsv genozip --md5 {}_1.fastq.gz {}_2.fastq.gz --pair -E ./GRCh37.ref.genozip -o {/}.grch37.genozip
genozip ./GRCh37.ref.genozip : Reading and caching reference hash table...
Error in file_put_data:1375: failed to rename ./GRCh37.ref.genozip.gcache.tmp to ./GRCh37.ref.genozip.gcache: No such file or directory
If this is unexpected, please contact ***@***.***
genozip ./GRCh37.ref.genozip : Done
Error in file_put_data:1375: failed to rename ./GRCh37.ref.genozip.gcache.tmp to ./GRCh37.ref.genozip.gcache: No such file or directory
If this is unexpected, please contact ***@***.***
genozip ./GRCh37.ref.genozip : Done
genozip ADII-0679-201388_1.fastq.gz : 0%
Im assuming this is due to the cache being unable to be accessed by more
than two files at once?
—
Reply to this email directly, view it on GitHub
<#23>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ANN4H564DSBTJ4ZCYJSCZYTVAJ2DXANCNFSM5Q5MMVEA>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
jimmybgammyknee commented
Great thanks @divonlan !