pinellolab/dictys

Can't update config.mk in the "2-makefile" notebook

Closed this issue · 11 comments

Hello, I am trying to run the test dataset and am running into a lot of issues.

One of them seem to be when running:

!dictys_helper makefile_update.py ../makefiles/config.mk '{"ENVMODE": "none", "NTH": "4", "DEVICE": "cpu", "GENOME_MACS2": "hs", "JOINT": "1"}'

Here, I simply change the code from the notebook for DEVICE "cpu" since I am trying to run it on cpu.
This step in suppose to be long according to your note: "Note: using CPU may take days or over a week for this example."
But for me it run in 2 - 3s but does not show any error.

Nevertheless, in the third notebook I am getting an error on the second code chunk:

#Detect whether joint profiles from makefile
with open(pjoin(dirmakefiles,'config.mk'),'r') as f:
	s=f.readlines()
s=[x.strip() for x in s if x.startswith('JOINT=')][-1]
s=s[len('JOINT='):]
if s not in {'0','1'}:
	raise ValueError('Invalid JOINT variable in '+pjoin(dirmakefiles,'config.mk'))
isjoint=s=='1'
print(f'Joint profile: {isjoint}')

FileNotFoundError Traceback (most recent call last)
Cell In [4], line 2
1 #Detect whether joint profiles from makefile
----> 2 with open(pjoin(dirmakefiles,'config.mk'),'r') as f:
3 s=f.readlines()
4 s=[x.strip() for x in s if x.startswith('JOINT=')][-1]

FileNotFoundError: [Errno 2] No such file or directory: '../makefiles/config.mk'

Do you have any idea what may go wrong here?

Thanks for your help !

Hi JasonOSS.

Thanks for the question. I assume you are talking about the full-multiome tutorial?

First, 2-makefile.ipynb only prepares the Makefiles so it should finish in seconds.

Regarding the error, it is suggesting some preceeding steps went wrong. Could you first make sure all the preceeding notebooks and cells completed once and in order? If not, can you redownload and rerun the whole tutorial in a different folder? If the problem persists, can you upload this and the preceeding notebooks so I can diagnose it?

Lingfei

Hi Lingfei,

Yes, I am talking about the full-multiome tutorial.

I tried to re-run the 1-data.ipynb in a new folder. It seems that I have an issue when running:

%%bash
dictys_helper genome_homer.sh hg38 ../data/genome

and then

%%bash
ls -h1s ../data/genome | head

Because instead of having a total of 4.4G I have a total of 2.3G:

total 2.3G
512 annotations
12K chrom.sizes
1.8G genome.fa
512 hg38
1.7M hg38.aug
15M hg38.basic.annotation
209M hg38.full.annotation
83K hg38.miRNA
203M hg38.repeats

Hi JasonOSS,

Just based on your description, I do not find any issue here. The command completed without any error, right?

There is an inconsistency of file sizes but it happens to all files. Therefore it's very unlikely to be from an incomplete download.

Possibly you are using a slightly different version of homer or reference genome. It could be that your own homer is prioritized over the homer installed together with Dictys. It could also be that homer updated after the tutorial notebook was produced. It might also be a different sizing metric between the file systems, especially with whole file system compression.

Therefore I do not consider this to be an issue for now, unless it causes some other issue. The initial issue you raised seems independent from it as well.

I suggest to continue running the tutorial to see if you can find or replicate any issue.

Thanks,

I continued the tutorial, and in the 3-static-inference

when running

%%bash
set -eo pipefail
cd ~/Dictys/Test_3
#Run CPU part of inference
make -f makefiles/static.mk -j 8 -k cpu || true

I get this error:

Traceback (most recent call last):
File "/dictys2/python3.9/lib/python3.9/runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "
/dictys2/python3.9/lib/python3.9/runpy.py", line 87, in _run_code
exec(code, run_globals)
File "/dictys2/python3.9/lib/python3.9/site-packages/dictys/main.py", line 13, in
docstringrunner(package)
File "
/dictys2/python3.9/lib/python3.9/site-packages/docstring2argparse/init.py", line 340, in docstringrunner
run_args(pkgname,funcs,args)
File "/dictys2/python3.9/lib/python3.9/site-packages/docstring2argparse/init.py", line 330, in run_args
return func(*a,**ka)
File "
/dictys2/python3.9/lib/python3.9/site-packages/dictys/chromatin.py", line 67, in macs2
d2 = shell.cmdfile(cmd,[],infiles={'cellnames.txt': namestxt},quiet=False,cd=True)
File "~/dictys2/python3.9/lib/python3.9/site-packages/dictys/utils/shell.py", line 181, in cmdfile
raise RuntimeError('Command failed, possibly due to program error: ' + cmda)
RuntimeError: Command failed, possibly due to program error: ~/dictys2/python3.9/lib/python3.9/site-packages/dictys/scripts/chromatin_macs2.sh cellnames.txt ~/Dictys/Test_3/data/bams ~/Dictys/Test_3/tmp_static/Subset13/reads.bam ~/Dictys/Test_3/tmp_static/Subset13/reads.bai ~/Dictys/Test_3/tmp_static/Subset13/peaks.bed hs 0.05 4
[bam_sort_core] merging from 4 files and 4 in-memory blocks...
~/dictys2/python3.9/lib/python3.9/site-packages/dictys/scripts/chromatin_macs2.sh: line 37: macs2: command not found

make: *** [tmp_static/Subset13/peaks.bed] Error 1

This error does not stop the script but keep happening many times as the script is running.

Hi JasonOSS. It appears that either Dictys was not installed properly, or you did not enter the correct conda environment. Do you have the full installation log? What command did you use to enter the environment?

Hi lingfeiwang,

to install the environment I did:

[user python]$ bash Miniconda3-py39_4.12.0-Linux-x86_64.sh -b -p $HOME/dictys/python3.9
[user python]$ cd $HOME/dictys/python3.9/bin
[user bin]$ ./conda install -c conda-forge jupyterlab
[user bin]$ ./conda install -c bioconda -c conda-forge -c pytorch pytorch torchvision torchaudio bedtools homer samtools macs2 ffmpeg

[user bin]$ ./pip install git+https://github.com/pinellolab/dictys.git

[user bin]$ ./pip uninstall pyDNase
[user bin]$ ./pip install -U matplotlib 
[user bin]$ ./pip install --no-deps pyDNase

Then I can load this environment on our server simply specifying $HOME/dictys/python3.9 on our interface to load Jupiter lab

Thank you for providing the information.

Actually, this installation is not the way we would recommend. It may end up working, but often requires some troubleshooting that can be avoided using our suggested installation commands.

Setting that aside, the issue seems to be from jupyterlab. In our past experience with jupyter and jupyterhub, if you add a kernel for this environment, it could not capture the full environment such as the PATH variable. That can explain why macs2 as a command line tool in the environment cannot be found. This is an issue in jupyter kernel creation, not Dictys. We have not tried jupyterlab but I assume its underlying approach is similar.

That is why we recommend to run jupyter servers within the conda environment for the tutorials. Is that possible for you?

We are running redhat 7. We cannot use Option 1: with Anaconda and Option 2: with bash script because your mamba version installs an incompatible glibc.

If you can provide the containers as described in Option 3, I can pull the container as singularity image.

Can we generate a softlink from pkgs/macs2-2.2.7.1-py39hbf8eff0_4/bin/macs2 to bin folder? The bin folder is added to the PATH.

Container is indeed on our todo list, among other things for better user experience. However, it will not be available in days. Btw, Option 2 does not need mamba.

You can create symlinks for each program needed, but homer calls several perl scripts and I am unsure if that would work.

Can you first enter the conda environment, add all PATH for the dependent softwares, and then run jupyterlab? See https://github.com/pinellolab/dictys/blob/master/INSTALL.md .

Alternatively you can try other places to modify PATH, such as .bashrc, .bash_profile, and export PATH=...; make ... in the notebook cell. This is only needed for network inference.

Thanks for your help, I was able to run the tutorial on another server.