broadinstitute/CellBender

different priors on GPU vs. CPU?

bobermayer opened this issue · 0 comments

Hi,

I'm running cellbender v0.3.0 and it fails early on a GPU but not on a CPU:

on a GPU I get

$ cellbender remove-background --input CR_SP225_012_CS1635/outs/raw_feature_bc_matrix.h5 --output tmp_gpu/SP225_012_CS1635.h5 --expected-cells 5798 --total-droplets-included 20000 --fpr 0.01 --cuda
cellbender:remove-background: Command:
cellbender remove-background --input CR_SP225_012_CS1635/outs/raw_feature_bc_matrix.h5 --output tmp_gpu/SP225_012_CS1635.h5 --expected-cells 5798 --total-droplets-included 20000 --fpr 0.01 --cuda
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash 5c5b475e6f)ded 20000 --fpr 0.01 --cuda
cellbender:remove-background: 2024-02-08 13:29:36
cellbender:remove-background: Running remove-background
cellbender:remove-background: Loading data from CR_SP225_012_CS1635/outs/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Features in dataset: 32285 Gene Expression
cellbender:remove-background: 23284 features have nonzero counts.
cellbender:remove-background: TEMP: Saved UMI count plot as umi_hist.pdf
cellbender:remove-background: Prior on counts for cells is 3856
cellbender:remove-background: Prior on counts for empty droplets is 3963
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 1981
Traceback (most recent call last):
  File "/fast/users/obermayb_c/work/miniconda3_conda4.12/envs/CellBender/bin/cellbender", line 33, in <module>
    sys.exit(load_entry_point('cellbender', 'console_scripts', 'cellbender')())
  File "/fast/work/users/obermayb_c/CellBender/cellbender/base_cli.py", line 112, in main
    cli_dict[args.tool].run(args)
  File "/fast/work/users/obermayb_c/CellBender/cellbender/remove_background/cli.py", line 156, in run
    main(args)
  File "/fast/work/users/obermayb_c/CellBender/cellbender/remove_background/cli.py", line 199, in main
    run_remove_background(args)
  File "/fast/work/users/obermayb_c/CellBender/cellbender/remove_background/run.py", line 63, in run_remove_background
    fpr=args.fpr)
  File "/fast/work/users/obermayb_c/CellBender/cellbender/remove_background/data/dataset.py", line 118, in __init__
    expected_cells=expected_cell_count)
  File "/fast/work/users/obermayb_c/CellBender/cellbender/remove_background/data/dataset.py", line 321, in _trim_droplets
    f"There are no empty droplets with UMI counts over the lower " \
AssertionError: There are no empty droplets with UMI counts over the lower cutoff of 1981.  Some empty droplets are necessary for the analysis.  Reduce the --low-count-threshold parameter.

while on a CPU

$ cellbender remove-background --input CR_SP225_012_CS1635/outs/raw_feature_bc_matrix.h5 --output tmp_cpu/SP225_012_CS1635.h5 --expected-cells 5798 --total-droplets-included 20000 --fpr 0.01
cellbender:remove-background: Command:
cellbender remove-background --input CR_SP225_012_CS1635/outs/raw_feature_bc_matrix.h5 --output tmp_cpu/SP225_012_CS1635.h5 --expected-cells 5798 --total-droplets-included 20000 --fpr 0.01
cellbender:remove-background: CellBender 0.3.0
cellbender:remove-background: (Workflow hash 44e1168775)  
cellbender:remove-background: 2024-02-08 13:28:55
cellbender:remove-background: Running remove-background   
cellbender:remove-background: Loading data from CR_SP225_012_CS1635/outs/raw_feature_bc_matrix.h5
cellbender:remove-background: CellRanger v3 format
cellbender:remove-background: Trimming dataset for inference.
cellbender:remove-background: Features in dataset: 32285 Gene Expression
cellbender:remove-background: 23284 features have nonzero counts.
cellbender:remove-background: TEMP: Saved UMI count plot as umi_hist.pdf
cellbender:remove-background: Prior on counts for cells is 3094
cellbender:remove-background: Prior on counts for empty droplets is 157
cellbender:remove-background: Trimming barcodes for inference.
cellbender:remove-background: Excluding barcodes with counts below 78
cellbender:remove-background: Using 5798 probable cell barcodes, plus an additional 14202 barcodes, and 61518 empty droplets.
cellbender:remove-background: Largest surely-empty droplet has 239 UMI counts.
cellbender:remove-background: Further trimming features for inference.
cellbender:remove-background: Including 13735 features that are estimated to have > 0.1 background counts in cells.
cellbender:remove-background: TEMP: Saved UMI count plot as umi_hist.pdf
cellbender:remove-background: Prior on counts for cells is 3075
cellbender:remove-background: Prior on counts for empty droplets is 180
cellbender:remove-background: Attempting to unpack tarball "ckpt.tar.gz" to /data/cephfs-1/home/users/obermayb_c/scratch/tmp/tmp3ssep7fd
cellbender:remove-background: No saved checkpoint.
cellbender:remove-background: No checkpoint loaded.
cellbender:remove-background: Running inference...

interestingly, setting --low-count-threshold 5 reverses the situation: now the GPU call runs successfully, while the CPU call fails with the same problem as the GPU before.

the priors are the following

GPU:

low-count-threshold cells empty_droplets
15 3856 3963
5 3083 155

CPU:

low-count-threshold cells empty_droplets
15 3094 157
5 3750 3864

seems like sth with the heuristics is off.