ay-lab/dcHiC

question about input/sample subtypes

Closed this issue · 5 comments

Hello, thank you for developing this tool!

I am wondering if I want to run differential analysis on three tumor subtypes (with ~10-20 per subtype), would I indicate those three as different sample, so there would be three sample names and multiple replicates for each?
I am assuming then it would also ignore all the differential compartments within a "sample" (tumor subtype).

ay-lab commented

Hi,

Here is an example of input file you're looking for

ES_1_100Kb.matrix        ES_1_100Kb_abs.bed       ES_1_100Kb      ES
ES_2_100Kb.matrix        ES_2_100Kb_abs.bed       ES_2_100Kb      ES
ES_3_100Kb.matrix        ES_3_100Kb_abs.bed       ES_3_100Kb      ES
ES_4_100Kb.matrix        ES_4_100Kb_abs.bed       ES_4_100Kb      ES
NPC_2_100Kb.matrix     NPC_2_100Kb_abs.bed       NPC_2_100Kb     NPC
NPC_3_100Kb.matrix     NPC_3_100Kb_abs.bed       NPC_3_100Kb     NPC
NPC_4_100Kb.matrix     NPC_4_100Kb_abs.bed       NPC_4_100Kb     NPC
CN_1_100Kb.matrix     CN_1_100Kb_abs.bed       CN_1_100Kb     CN
CN_2_100Kb.matrix     CN_2_100Kb_abs.bed       CN_2_100Kb     CN

Here, the third column is the replicate name, and 4th one is the sample name.
The final differential analysis will be done among the unique names provided in the 4th column.

Assuming the replicates are perfect, there should not be any differential regions within replicates. If we see differential parts within replicates, then it represents technical issues in an experiment. Hence, while doing the differential analysis with replicates, dcHiC will calculate how good they are within themselves. This will create a weight, which will finally be used for Independent Hypothesis Weighting (IHW) to adjust the p-values. A highly sample-wise differential region can get a very bad p-value if the replicates of one of the samples show very high variability.

If you think your sub-type replicates have valid biological differential regions, then I would suggest you split those replicates and treat them separately.

Hello! Thank you so much for the response. I apologize if my question was not clear. My replicates are not actual technical replicates- I have 10 samples (from different patients) with each of the three tumor subtypes I am looking at, so 3x10 for each "sample". In that case, is it still appropriate to list them as "replicates"? Or would you recommend a multi-variate analysis? Thank you!

ay-lab commented

Sorry for misunderstanding the structure. You indeed have very rich data.

I would not recommend treating them as replicates. Each one can have it's own subtype-specific structural variation that can influence the Hi-C data.

I can think of two different ways to deal with the data. You can first run a subtype-specific differential compartment analysis across patients. At the end, you will have three differential-compartment result files, each corresponding to 3 tumor subtypes across patients. This will show you how a subtype 3D compartment varies across patients (inter-patient).
Next, you can perform subtype-wise differential compartment analysis for each patient. Ultimately, you will get 'N' (number of patients) dcHiC result. This will reflect how compartments of subtypes vary within a patient (intra-patient).

Both of the analyses provide two different sets of results. For example, you may see a non-differential region of compartment of a subtype in the inter-patient analysis as a differential compartment in the intra-patient analysis. This perhaps means that the region is a highly subtype specific region.

Hope my points are clear.

ay-lab commented

It all depends on what question you want to ask. It is perfectly fine if your hypothesis is that there are recurrent compartmental differences that separate one tumor type from the other(s) and for that if you use different samples as "replicates". You will likely miss differences that are specific to a subset of tumors in that tumor type but that is ok if those are not what is of most interest to you.
As a general guideline, you can think of this similarly to differential gene expression analysis. All types of comparisons are ok as long as you have a clear hypothesis and when discussing your results, you explain what you did.

  • Ferhat

That all sounds good. Thank you for the help!