pinellolab/dictys

How to use Bulk ATAC to Run

Closed this issue · 18 comments

Checks before submitting the issue

  • I installed the latest Dictys version successfully
  • I downloaded the whole folder for the tutorial
  • I ran the tutorial cells and notebooks (if more than one) in the correct order and only once
  • The preceeding cells and notebooks finished successfully (i.e. the reported error is the first encountered)

Describe the error
*I have scRNA data and Bulk ATAC data from four samples. According to the paper, we can utilize bulk ATAC data to construct a network. However, in the tutorial provided at https://nbviewer.org/github/pinellolab/dictys/blob/master/doc/tutorials/short-multiome/notebooks/main.ipynb, the step "1. Preparation of individual input files in the data folder “the bam seems to be intended for scATAC data. How should I proceed to process my bulk ATAC data to get a network?
*Thank you for your guidance and advice.

Optional steps (may accelerate troubleshooting)

Hi hzauleibowen,

Thank you for the question.

If you used macs2 to call peaks, can you follow the steps for bulk ATAC-seq at #23 (comment)? You can use either cell-type specific or tissue specific bulk ATAC-seq.

Let us know here if you have any issue.

Lingfei

Hi Lingfei,

Thanks so much for your timely reply.

I used macs2 to call peaks and mapped all the raw data,I will follow your advice to process the data in the coming days.

I hope that this approach will yield the results we desire.

Bowen

Dear Lingfei,

Thanks for your helpful software. I have tried the Dictys with bulk ATAC-seq data following #23 (comment). However, I found the homer.tsv.gz and wellington.tsv.gz files were empty. I just do not know why. The the peaks generated from macs2 were formated as follows:
chr1 24612366 24613560 peak1:chr1:24612366:24613560:172.29200
chr1 20819892 20820782 peak2:chr1:20819892:20820782:136.96100
chr14 54517060 54518015 peak3:chr14:54517060:54518015:135.39000
chr15 75085437 75087073 peak4:chr15:75085437:75087073:135.39000
chr14 119007166 119007934 peak5:chr14:119007166:119007934:134.67700
chr2 150362345 150363260 peak6:chr2:150362345:150363260:134.64800
chr2 148731571 148732731 peak7:chr2:148731571:148732731:133.37600
chr12 91383694 91384900 peak8:chr12:91383694:91384900:132.47700
chr13 119689853 119690996 peak9:chr13:119689853:119690996:131.85300

Could you be kind to help me?

Thanks again!
Best,
Bob

Hi Bob,

Happy to help. Did you run makefile_check.py as in the tutorial? Could you paste the output here?

Lingfei

Hi Bob,

Could you update Dictys to the dev branch to perform more checks with this script? If you already have version 1.0.0, you can update with pip3 install --no-deps --force-reinstall git+https://github.com/pinellolab/dictys@dev in the right conda environment. After that, can you rerun makefile_check.py to see the output?

Lingfei

Hi Bob,

Forgot to mention, could you run makefile_check.py -c instead for more tests please? I'm suspecting you have gene name mismatch in some of your input files.

Lingfei

Thank you, Bob.

The results suggest gene name mismatch is not a problem. There are quite some TFs recognized in the motif file. Can you paste the output of line !cd ..; dictys_helper network_inference.sh -j 32 -J 1 static in the notebook here?

Lingfei

Hi Bob. Unfortunately email attachments are not transferred to github issues. Can you upload it on github?

Hi bioinformaticspcj,

I'm afraid your pasted reply was truncated. Is there any chance you can upload the full version as an attachment on github website or somewhere else? Or can you spot the error on your side?

Lingfei

Dear Lingfei,

Thanks for your suggestion. I have uploaded the output to https://figshare.com/articles/online_resource/Single_Cell_Log/24278206 in Figshare.

Hope you could find it !

Best,
Bob

Hi Bob,

This appears to be the error:

Traceback (most recent call last):
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/site-packages/dictys/__main__.py", line 13, in <module>
    docstringrunner(__package__)
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/site-packages/docstring2argparse/__init__.py", line 340, in docstringrunner
    run_args(pkgname,funcs,args)
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/site-packages/docstring2argparse/__init__.py", line 330, in run_args
    return func(*a,**ka)
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/site-packages/dictys/chromatin.py", line 248, in homer
    return _motif_postproc(d2,fi_exp,fo_bed,fo_wellington,fo_homer)
  File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/site-packages/dictys/chromatin.py", line 192, in _motif_postproc
    raise ValueError('Found non-unique motif name suffices. Each motif name is recommended to contain a unique suffix.')
ValueError: Found non-unique motif name suffices. Each motif name is recommended to contain a unique suffix.

As described above, the problem should be in your motif file. Could you add _0, _1, ... to the end of the name of each motif so each has a unique suffix? After that if the same error occurs, you can upload the motif file here so I can have a look.

Lingfei

Also, you can run makefile_check.py first as well. The new version should provide you with more information for diagnosis.

Dear Lingfei,

Thanks for your suggestion. I have tried to add the number to distinguish each motif name. However the same error still occured. I also tried dictys_helper makefile_check.py -c command, the results are as follows:
Joint profile: True
Found 7047 cells with RNA profile
Found 55095 genes with RNA profile
ERROR:root:2
WARNING:root:Using RNA cell names for ATAC cell names for validations below.
Found 7047 cells with ATAC profile
Found 356 motifs
Found 356 TFs
Found 311 TFs in current dataset
Missing 45 TFs in current dataset: ANDR,AP2A,AP2C,ARI5B,BHA15,BHE40,BMAL1,BRAC,COE1,COT1,COT2,DMRTB,EVI1,GCR,HEN1,HNF6,HTF4,ITF2,KAISO,NDF1,NDF2,NGN2,NKX2-8,PEBB,PKNX1,PRD14,PRD16,PRGR,RORG,SUH,TF2L1,TF65,TF7L1,TF7L2,TFE2,THA,THA11,ZBT17,ZBT7A,ZKSC1,ZN143,ZN281,ZN322,ZN335,ZN431
Found 268 genes with TSS information
WARNING:root:Cannot find dynamic.mk or traj_node.h5. Skipping dynamic network inference checks.
Traceback (most recent call last):
File "/data/nfs/OriginTools/pcj/python3/miniconda3/envs/dictys/lib/python3.9/site-packages/dictys/scripts/helper/makefile_check.py", line 354, in
raise RuntimeError(f'Found {nerr} error(s) in total.')
RuntimeError: Found 1 error(s) in total.

The motif file that I used has been uploaded to https://figshare.com/articles/dataset/Mouse_TF_motif/24333844 in figshare.

Could you help me to have a look to handle this issue?

Best,
Bob