Bioinformatics A-Z
This repository is my own note. I plan to sort out what I learned while I was in graduate school.
- Bioinformatics
- Interdisciplinary field that combines biology, computer science, statistics, etc.
- Introduction to probability distributions
- Random variable (확률 변수)
- Discrete random variable
- Continuous random variable
- Probability functions
- 0 ≤ p(x) ≤ 1.0
- Are under a probability function is always 1
- Probability mass function (pmf)
- discrete probability distribution
- ex) p(x=1)=1/6
- Probability density function (pdf)
- continuous probability distribution
- Cumulative distribution function (cdf)
- P(x≤1)=1/6 - Expected value and variance
- Expected value (mean)
- mean of random variable x
- E(X) = µ
- Variance (standard deviation squared)
- s(sigma)^2 = Var(x) = E(x-µ)^2
- expected (or average) squared distance (or deviation) from mean
- Var(X) = s(sigma)^2
- SD(X) = s(sigma)
- Binomial probability distribution
- n: observation
- binary outcome
- constant probability for each observation
- X ~ Binom(n, p)
- E(X) = np
- Var(X) = np(1-p) - SD(X)= sqrt(np(1 - p))
- Normal distribution - N(µ, s(sigma)^2)
- Standard normal distribution
- N(0, 1) - Z
- t-distribution
- looks like normal, but slightly thicker tail than normal
- occurs when you estimate mean and variance of distribution from data
- degree of freedom depends on sample size of estimation
- when d.f. is large, t converges to normal - Chi-square distribution - Z^2 follows x1^2 (chisqaure distribution with 1 d.f.)
- Z1^2 + Z2^2 follow x2^2 (d.f.=2)
- Basic statistics for BI
- P-value
- probability that one would observe same or more extreme observation under null hypothesis - null hypthesis(H0)
- uninteresting situation
- alternative hypthesis(H1)
- interesting situation - Easy-to-use statistic's properties
- designed to be zero for H0, and non-zero for H1
- it follows a known distribution (normal, t, ..) under H0
- z-score
- if a static follows (N0,1) under H0
- Central limit theorem
- sample is large --> normally distributed - sample is small --> often follow t-distribution
- Normal vs chisquare distribution
- z follows N(0,1) --> z^2 follows chi-square distribution with d.f. 1
- pchisq(3.2^2, df=1, lower.tail=F) - Statistical power
- chance that data will be significant if H1 is true - opposite concept of P-value
- function of sample size, effect size
- 대립가설이 사실임에도 불구하고 귀무가설을 채택할 확률: 2종 오류(β error)
- statistical power = 1 - β
- Permutation test
- repeatedly shuffle data to impose null hypothesis
- useful if statistic doesn't have known distribution, or if sample size is too small for CLT to work
- 2x2 table analysis
- chi-square test formula
- Fisher's exact test
- t-test
- ANOVA
- Analysis of Variance
- If means of >2 groups are equal
- follows F-distribution
- Log-rank test
- for survival analysis
- Kaplan-Meier curve (Visualization)
- Linear regression
- Logistic regression
- P-value