Bayesian Statistics in simple English

읽은 글은 Bayesian Statistics explained to Beginners in Simple English

baysian inference가 뭔가 찾아보다가 발견한 문서.(variational inference와는 어떻게 다른거지?)

1. Frequentist Statistics

2. The Inherent Flaws in Frequentist Statistics

여기 p-value랑 C.I.(confidence interval)나오는데, 이런 기본용어 설명이 여기가 기가 막히다. 이것도 따로 빼둔다.

3. Bayesian Statistics

3.1 Conditional Probability

Bayes theorem is built on top of conditional probability and lies in the heart of Bayesian Inference.

3.2 Bayes Theorem

$\Large P(A|B) = \frac{\Large P(B|A_i) P(A_i) }{\Large \sum_{i=1}^n P(B|A_i)P(A_i)} $

4. Bayesian Inference

prior × liklihood = posterior × evidence

cf. Variational Inference (pdf)

4.1. Bernoulli likelihood function

$\Large P(y|θ)=θ^y(1-θ)^{1-y}$

$ y =\{0, 1\}, θ = (0, 1) $

$y=1$ means 'head of a coin', $θ$ means fairness of a coin.

so,

$$ P(y_1, y_2, ... , y_n | θ) = \prod_1^n P(y_i|θ) $$ $$ P(z,N|θ) = θ^z (1-θ)^{N-z} $$ where z is a number of heads and N is a number of flips.

이게 prior다. $P(\theta)$

4.2. Prior Belief Distribution

위키의 prior probability도 읽어보면 좋을것 같지만 길어서 미룬다;;

beta($P(p;α,β)=\frac{p^{α-1} (1-p)^{β-1}}{B(α,β)}, B $는 그냥 normalizer.)는 binomial의 conjugate prior이기도 하다.

beta dist 위키를 보면, 아래가 핵심

Beta distributions provide a family of prior probability distributions for binomial distributions in Bayesian inference.

binomial의 prior가 곧 beta라는 얘기.(과정은 여기. 아래 퍼둠. Conjugate prior라고 했으니까 binomial 또한 beta의 family다.)

$${\displaystyle {\begin{aligned}P(s,f\mid q=x)&={s+f \choose s}x^{s}(1-x)^{f},\\P(x)&={x^{\alpha -1}(1-x)^{\beta -1} \over \mathrm {B} (\alpha ,\beta )},\\P(q=x\mid s,f)&={\frac {P(s,f\mid x)P(x)}{\int P(s,f\mid x)P(x)dx}}\\&={{{s+f \choose s}x^{s+\alpha -1}(1-x)^{f+\beta -1}/\mathrm {B} (\alpha ,\beta )} \over \int _{y=0}^{1}\left({s+f \choose s}y^{s+\alpha -1}(1-y)^{f+\beta -1}/\mathrm {B} (\alpha ,\beta )\right)dy}\\&={x^{s+\alpha -1}(1-x)^{f+\beta -1} \over \mathrm {B} (s+\alpha ,f+\beta )},\\\end{aligned}}}$$

$P(p;α,β)$에서 $α$가 number of heads, $β$가 nubmer of tails다.

이제 $α,β$에 대충 아무거나 넣고 그래프를 그려보면, prior(=P(head)의 dist)가 나온다.

시행횟수 ($N=α+β$)가 많아질수록,(head가 N/2에 근접할 때) 가운데(=1/2)가 뾰족해지는 경향이 있다. 상식적인 결과.

4.3. Posterior Belief Distribution

위 퍼온 식에서 마지막 식($P(x | s, f) $)이 posterior belief.

Prior belief가 $P(\theta | \alpha, \beta) $일 때, posterior belief는 $P(\theta | z + \alpha , N - z + \beta)$가 된다.

mean과 variance만 알면 $\alpha, \beta$를 알 수 있으므로[1][2], 사건의 관찰로부터 our belief about the model parameter($θ$)를 추정할 수 있다.

5. Test for Significance – Frequentist vs Bayesian

5.1. p-value

taking an example of p-value as 0.02 for a distribution of mean 100

=There is 2% probability that the sample will have mean equal to 100.

즉, $H_0$(=no difference)가 참일 확률.
보다 일반적으로는, ‘실험결과가 우연히 발생할 확률’(what the odds are that your results could have happened by chance [3]).

따라서 일반적인 공식은 없으며 상황에 따라 정의해서 쓰면 된다. 보통 모집단은 정규분포를 가정하므로, 관찰 집단의 스코어(예를 들어 t-score^[1], 혹은 z-score)를 구한 후 정규분포표에서 면적을 본다. 모표준편차를 모르는 경우가 대다수 이므로, t-score로 p-value를 구할 때 sample size에 영향받게 되는데, 이것이 p-value의 단점이다.

5.2. Confidence Intervals

p-value와 같은 defect를 가짐.

5.3. Bayes Factor

Baysian framework에서 p-value와 같은 역할을 함.

The null hypothesis in bayesian framework : $P(θ)=\infty$ at fixed $θ$, and $P(θ)=0$ elsewhere.

Bayes factor is defined as the ratio of the posterior odds to the prior odds,

$$\begin{equation} \Large \text{ BF}= \frac{\text{posterior odds}}{\text{prior odds}} = \frac{\frac{ P(M=null | z, N)}{P(M=alt | z, N)}}{ \frac{P(M=null)}{P(M=alt)}} \end{equation}$$

To reject a null hypothesis, a BF <1/10 is preferred.

훌륭한 한글 자료가 있다. 이에 따라 다시 쓰면, $$ \large \text{(Posterior odds)} = \text{(Bayes Factor)} \times \text{(Prior odds)} $$ $$ \begin{equation} \large \frac{P(H_1|D)}{P(H_0|D)} = \frac{P(D|H_1)}{P(D|H_0)} \times \frac{P(H_1)}{P(H_0)} \end{equation} $$ 위 (1)식과 (2)식이 결과값은 같으나 (1)은 분모분자의 순서가 의미상 뒤바뀐듯 하고, 일반적으로 (2)의 표현을 쓰는듯 하다.[4]

5.4. High Density Interval (HDI)

Since HDI is a probability, the 95% HDI gives the 95% most credible values. It is also guaranteed that 95 % values will lie in this interval unlike C.I.

↑ 모집단의 편차를 모를 때, 샘플의 개수가 15개 미만일때 주로 쓴다 함. 식은 $ t = \frac{\bar{x} - \mu_0}{ s / \sqrt{n}} $ where $\mu_0$ is population mean, $s$ is sample standard deviation.

blog comments powered by Disqus

[1] 모집단의 편차를 모를 때, 샘플의 개수가 15개 미만일때 주로 쓴다 함. 식은 \( t = \frac{\bar{x} - \mu_0}{ s / \sqrt{n}} \) where \(\mu_0\) is population mean, \(s\) is sample standard deviation.

[1]

둘러보기 메뉴