morris-lab/CellOracle

Problems with reference genome for construction of base GRN from scATAC data

Closed this issue · 5 comments

Hi, firstly thank you for this great GRN inference method and the extensive documentation that comes with it!

I'm currently running into issues constructing a baseGRN from scATAC data. I'm on the third step of the tutorial
working with the '02_atac_peaks_to_TFinfo_with_celloracle_20200801' notebook.

Specifically, I have troubles with where CellOracle looks for my reference genome (hg38). I suspect, from the default download location of the genomepy.install_genome(), it checks ~/.local/share/genomes. Am I correct? However, due to me working on a remote server with very limited storage space in my home directory, I was wondering if there was a parameter I could change throughout the script to set the directory for my reference genome?

I have looked into this myself in detail but have not found a way to supply f.e. celloracle.motif_analysis.TFinfo() with a custom reference genome location, as it seems to only take a string as input detailing the name of the reference genome, looking for this in a default location.

How could I resolve this issue?
Thank you in advance!

Any update for this? I am also working on a remote server with weird firewall settings, which made genomepy return | WARNING | UCSC appears to be offline. all the time.

@rickycolman , @DongzeHE

Thank you for the feedback, and sorry for the slow response.
I have updated celloracle and our tutorials. https://morris-lab.github.io/CellOracle.documentation/notebooks/02_motif_scan/02_atac_peaks_to_TFinfo_with_celloracle_20200801.html#3.-Instantiate-TFinfo-object-and-search-for-TF-binding-motifs

Please install celloracle version 0.14.0.

Now, we can specify custom location for the reference genome data.

For example, let's say you have installed reference genome data under CUSTOM_DIR. Please enter the directory information in several steps. For example, please enter the genome directory information when you create TFinfo object.

tfi = ma.TFinfo(peak_data_frame=peaks, ref_genome=ref_genome, genomes_dir=CUSTOM_DIR)

I hope it helps.

@DongzeHE

  1. If your environment doesn't have internet access, please place reference genome data manually and specify the location.

  2. FYI you may also have another error if you want to use CellOracle default dataset because some data loading function needs internet access to download data. In that case, please do git clone the repository, and install celloracle from source with pip install -e . . It will install celloracle including all default dataset, and you don't need to use internet access. You can also find some information regarding this internet issue #98.

tfi = ma.TFinfo(peak_data_frame=peaks, ref_genome=ref_genome, genomes_dir=CUSTOM_DIR)

This works but the tfi.scan() function does not take a custom dir and it will go find the genomes in the default dir.

tfi = ma.TFinfo(peak_data_frame=peaks, ref_genome=ref_genome, genomes_dir=CUSTOM_DIR)

This works but the tfi.scan() function does not take a custom dir and it will go find the genomes in the default dir.

the same question. have you solved it?

I could not reproduce the issue.
It seems the function is working as expected in my environment.

Can you please check your reference genome is installed correctly at your custom directory?
Please see the instruction in the tutorial how to check installation status.
Please specify the directory using genomes_dir in the function.
https://morris-lab.github.io/CellOracle.documentation/notebooks/02_motif_scan/02_atac_peaks_to_TFinfo_with_celloracle_20200801.html#1.-Rerefence-genome-data-preparation

If the reference genome is not installed in your custom directory, please install the reference genome as follows.

import genomepy
genomepy.install_genome(name=ref_genome, provider="UCSC", genomes_dir=GENOMES_DIR)

Also, please read descriptions of our tutorial for more information.https://morris-lab.github.io/CellOracle.documentation/notebooks/02_motif_scan/02_atac_peaks_to_TFinfo_with_celloracle_20200801.html#1.-Rerefence-genome-data-preparation

If you still have error, please copy and paste the whole error messages and celloracle version information you used. It will help us to identify the cause of issue.