seqcode/multimds

Input BED files

audreypeng opened this issue · 1 comments

Hi!

Thank you for developing such a great tool! I have been trying to run the test data with test.sh on my Mac terminal and it downloads the GE files fine. However, it then produces the error:

Traceback (most recent call last):
File "/Users/audreypeng/multimds/scripts/hic_data/../normalize.py", line 77, in
main()
File "/Users/audreypeng/multimds/scripts/hic_data/../normalize.py", line 72, in main
normalize_intra(args.hic_id, args.res, args.chrom1)
File "/Users/audreypeng/multimds/scripts/hic_data/../normalize.py", line 61, in normalize_intra
normalize(chromstring, chromstring, rawpath, krpath, None, res, outpath)
File "/Users/audreypeng/multimds/scripts/hic_data/../normalize.py", line 11, in normalize
kr1 = np.loadtxt(krpath1)
File "/Users/audreypeng/miniconda3/lib/python3.9/site-packages/numpy/lib/npyio.py", line 1042, in loadtxt
fh = np.lib._datasource.open(fname, 'rt', encoding=encoding)
File "/Users/audreypeng/miniconda3/lib/python3.9/site-packages/numpy/lib/_datasource.py", line 193, in open
return ds.open(path, mode, encoding=encoding, newline=newline)
File "/Users/audreypeng/miniconda3/lib/python3.9/site-packages/numpy/lib/_datasource.py", line 532, in open
raise FileNotFoundError(f"{path} not found.")
FileNotFoundError: GM12878_combined/100.0kb_resolution_intrachromosomal/chr1/MAPQGE30/chr1_100.0kb.KRnorm not found.

The issue might be the file name that it looks for being "100.0kb" instead of "100kb" but I am not sure, any help would be great!

Also, for normalized matrices generated by HiC-Pro, how would you best recommend converting it to a suitable input format for mulitMDS?

Thank you,
Audrey

Thanks for pointing out this issue! I just changed the res variable to an int, so you can pull the latest version and let me know if that fixes it.

MultiMDS takes paired-end BED files, so hicpro2bedpe would work. The format is a seven-column file, where columns 1-3 are locus 1 (chromosome, start coordinate, end coordinate), columns 4-6 are locus 2 (chromosome, start coordinate, end coordinate), and column 7 is the count of interactions between locus 1 and locus 2.