Z score
Closed this issue · 7 comments
Hello all,
I am getting a Z score of 300 and above in my co-occurring pairs from TFCOMB, and I have following questions:
- What is a Z score in this context?
- What does Z score 300 or above represent?
Thanks,
Rajashree
Hi @Rajashree93,
thank you for your question.
1.) In our context, z-score is the ratio between the observed counts per pair (minus mean counts from background estimation) and the standard deviation
2.) This means the observed count is 300 times higher than the standard deviation.
So either you have a very small standard deviation, which could lead to high ratios or the observed counts are outstandingly high. One could say with a z-score of 300 it is very unlikely that the observed counts fall into the distribution of background-scores per chance. However this need to be interpreted in the context of your experiment and the z-score distribution within.
I hope this answers your questions, if not please let me know and I'm happy to assist you further.
Kind regards,
Vanessa
Thanks, can you please expand on how I can do this "However this need to be interpreted in the context of your experiment and the z-score distribution within."?
Also, I was curious on the extent of effect of cosine values 0.5 vs 0.6. Is 0.1 difference in cosine similarity value significant?
Do you see my image? The left table is treatment data and the right table is control. I am trying to understand how much the effect of cosine similarity is on these. Most differences are .1 or .2 or .3. Are those significant? Thanks!
Hi @Rajashree93,
yes I see your image.
Regarding your question. It is not possible to calculate a p-value here, hence we infer importance from the distribution of all counts instead. This means for example we take the top 10% of scores as possible candidates.
Since you have differential data (control vs treatment) it may be more convenient for you to use the differential analysis methods offered by TF-COMB instead of comparing cosine scores manually. You can find an example here https://tf-comb.readthedocs.io/en/latest/examples/Differential_analysis.html
Kind regards
Vanessa
What do you mean by "infer importance from the distribution of all counts instead."? Can you simplify your reasoning with my data?
Why would u take top 10% as possible candidates? What is the justification? What score determines possible candidates?
Can we have a zoom meeting? This is part of my PhD thesis.
I will close this for now.
Please feel free to reopen the Issue anytime.