CombineNearbyInteraction failing

Question

CombineNearbyInteraction failing

pna059 opened this issue 3 years ago · 3 comments

Hi,
I have analyzed Hi-C data using HiC-Pro followed by FitHiChIP with recommended settings.
There is a large number of interactions, but at the step of merging adjacent loops, I am getting this error message:

******* applying merge filtering on the FitHiChIP significant interactions ******

****** Merge filtering of adjacent loops is enabled *****
***** within function of merged filtering - printing the parameters ***
*** bin_size: 20000
*** headerInp: 1
*** connectivity_rule: 8
*** TopPctElem: 100
*** NeighborHoodBinThr: 40000
*** QValCol: 26
*** PValCol: 25
*** SortOrder: 0
OutDir: FitHiChIP_leaf1_20k/FitHiChIP_ALL2ALL_b20000_L10000_U2000000/ICE_Bias/FitHiC_BiasCorr/Merge_Nearby_Interactions
list of chromosomes: ['chr1H', 'chr2H', 'chr3H', 'chr4H', 'chr5H', 'chr6H', 'chr7H']
Processing the chromosome: chr1H
Traceback (most recent call last):
File "./src/CombineNearbyInteraction.py", line 638, in
main()
File "./src/CombineNearbyInteraction.py", line 245, in main
CurrChrDict.setdefault(curr_key, Interaction(int(linecontents[CCCol-1]), float(linecontents[PValCol - 1]), float(linecontents[QValCol - 1])))
ValueError: invalid literal for int() with base 10: '11.757594'
----- Applied merged filtering (connected component model) on the adjacent loops of FitHiChIP
SORRY !!!!!!!! FitHiChIP could not find any statistically significant interactions after applying merge filtering on the generated set of loops !!
Option 1: use significant loops without merge filtering
What could be the problem? Is the chromosome format including "H" supported in this step?

Thank you
Pavla

Answer 1 · 2022-06-17T08:40:23.000Z

I have solved the issue by editing File "./src/CombineNearbyInteraction.py", line 245

CurrChrDict.setdefault(curr_key, Interaction(int(linecontents[CCCol-1]), float(linecontents[PValCol - 1]), float(linecontents[QValCol - 1])))

to

CurrChrDict.setdefault(curr_key, Interaction(float(int(linecontents[CCCol-1])), float(linecontents[PValCol - 1]), float(linecontents[QValCol - 1])))

I have got another, hopefully the last error regarding tbx index (my genome is a large plant genome):

[E::hts_idx_check_range] Region 537189999..537190001 cannot be stored in a tbi index. Try using a csi index with min_shift = 14, n_lvls >= 6
tbx_index_build failed: /auto/budejovice1/home/pavlan/FitHiChIP_leaf1_20k/FitHiChIP_ALL2ALL_b20000_L10000_U2000000/ICE_Bias/FitHiC_BiasCorr/Merge_Nearby_Interactions/FitHiChIP_leaf1_20k.interactions_FitHiC_Q0.05_MergeNearContacts_WashU.bed.gz

It would be good to consider the possibility of users working with such genomes and include the -c option with indexing in cases where the chromosome length limit is surpassed.

Answer 2 · 2022-06-17T18:08:04.000Z

Hi @pna059
Thanks for your suggestions.

Regarding the int/float conversion, did you provide floating-point numbers as the contact counts?
The .tbi index file is generated due to its visualization in the WashU browser. Using -C option is fine, but the resulting file may not be compatible with the WashU browser. I need to check more on this. Of course, the index file is generated at the last step, so you can use all other output files for subsequent analysis.

Answer 3 · 2022-06-17T18:58:37.000Z

Thank you for your reply. I am analyzing Hi-C data with the FitHiChIP pipeline.

All values were passed from the HiC-Pro result or generated by the pipeline itself and I just read from the error message that the value was a number with decimal places. So I just added the "float()" which should maybe replace the "int()" instead?
yes, I don`t really need it, but maybe others might in the future