pachterlab/sircel

10x Genomics large fastq

Closed this issue · 0 comments

I have a sample with 519,855,531 reads, and it leads to errors sometimes, see below. Subsampling this sample to 1,000,000 reads allows sircel + kallisto to complete successfully. (with some bugfixes in the 10x processing, PR forthcoming). The executed command was:

python3 sircel/Sircel_master.py \
  --10xgenomics \
  --barcodes 2T+1_R1_L001-L004.fastq.gz \
  --reads 2T+1_R2_L001-L004.fastq.gz \
  --umis 2T+1_I1_L001-L004.fastq.gz \
  --output_dir sircel_out_2T+1 \
  --threads 5 \
  --kallisto_idx gencode.v26.idx
        56600000 reads indexed
        56700000 reads indexed
        56800000 reads indexed
        56900000 reads indexed
        57000000 reads indexed
        57100000 reads indexed
Traceback (most recent call last):
  File "sircel/Sircel_master.py", line 318, in <module>
    output_files = run_all(args)
  File "sircel/Sircel_master.py", line 83, in run_all
    output_files, elapsed_time = Split_reads.run_all(split_args)
  File "/Users/coetzeesg/devel/sircel/sircel/Split_reads.py", line 57, in run_all
    (kmer_idx_pipe, barcodes_unzipped, reads_unzipped))
  File "/Users/coetzeesg/devel/sircel/sircel/Split_reads.py", line 133, in get_kmer_index_db
    pipe_out = kmer_idx_pipe.execute()#slow. chunks are large...
  File "/Users/coetzeesg/miniconda3/envs/sircel/lib/python3.6/site-packages/redis/client.py", line 2626, in execute
    return execute(conn, stack, raise_on_error)
  File "/Users/coetzeesg/miniconda3/envs/sircel/lib/python3.6/site-packages/redis/client.py", line 2540, in _execute_transaction
    self.raise_first_error(commands, response)
  File "/Users/coetzeesg/miniconda3/envs/sircel/lib/python3.6/site-packages/redis/client.py", line 2574, in raise_first_error
    raise r
redis.exceptions.ResponseError: Command # 1231066 (APPEND b'$GGAGAGAG' b'70374508616,70374508616,70374508616,') of pipeline caused error: string exceeds maximum allowed size (512MB)