Dirichlet process

(Almost all is from wiki.)

정의

Dirichlet process is a probability distribution whose range is itself a set of probability distributions.
(‘process is a distribution’)

The Dirichlet process can be seen as the infinite-dimensional generalization of the Dirichlet distribution.

성질

Dirichlet distribution이 categorical distribution의 conjugate prior이듯, Dirichlet process도 infinite, nonparametric discrete distribution의 conjugate prior가 된다.

대표적으로 infinite mixture model의 prior probability distribution으로 응용 가능하다.

Chinese Restaurant Process(CRP)^[1]의 visualization[2]을 보면 정말 기가 막힌데, 설명도 좋다.

samples (from the base measure H) yields a random sample of the Dirichlet process DP(0.5,H)

샘플링 해보면 ‘richer get richer’ fashion으로 뽑힌다. 그 과정은 아래와 같이 simulation해볼 수 있다.

input: \(H\) (a probability distribution called base distribution), \(α\) (a positive real number called scaling parameter)

Draw \(X_{1}\) from the distribution \(H\).
for \(n>1\):
1. With probability \(\displaystyle{\frac {\alpha }{\alpha +n-1}} \) draw \(X_n\) from \(H\).
2. With probability \(\displaystyle{\frac {n_{x}}{\alpha +n-1}} \) set \(X_n = x\), where \(n_x\) is the number of previous observations \(X_j\, ,\, j<n\) , such that \(X_j = x\).

여기서 \(x\)는 (cluster의) label이라고 생각하면 됨.
\(\sum n_x = n -1 \)이기 때문에 \( \displaystyle \sum_x \left( \frac{\alpha }{\alpha +n-1} + \frac{n_{x}}{\alpha +n-1} \right) = 1\)

\(X_i\)는 앞의 draw에 의존한다(=independent하지 않다). 하지만 exchangeable하다.(계산으로 보일 수 있다고 하나 위키에 안나오고 직접도 안해봄) 이때문에 de Finetti's theorem에 의해 ‘어떤 분포 \(P\)가 given이라고 하면 \(X_i\)가 서로 conditionally independent함’을 보일 수 있다. 따라서 위의 과정은 아래와 동일하다.

Draw a distribution \(P\) from DP\((H, \alpha) \)
Draw observations \(X_1, X_2, \cdots\) independently from \(P\).

기타

위키페이지 맨 끝 external links section에 좋은 문서들이 많이 링크되어 있는 것 같다.

한글로 된 블로그(Mad for Simplicity) 글중에 예제코드가 있는데 약간 이상하다. 초기설정을 랜덤하게 해놓고 시작하는게 맞나 싶다. 영문위키에 나온것과 다르다. matlab version문제인지 에러도 좀 나서 돌려보려면 약간 고쳐야 함.

↑ ‘중국집 분포’라고 번역하면 되려나. ‘중국집 다중 분포’라고 하면 ‘중국집 분포’와 ‘중국집 다중분포’가 따로 존재하는것 같으니까 별로인거 같다. multivariate distribution은 ‘다변수 분포’라고들 하는것 같다.
생각해보니 이러면 안되는게, 디리클레의 경우, ‘디리클레 분포dirichlet distribution’와 ‘디리클레 다중분포process’라고 번역하면 더 헷갈린다. ‘디리클레분포’도 ‘다중분포(distribution over distributions[1])’이기 때문이다. 걍 프로세스로 하기로 ㅎㅎ. 디리클레는 차라리 ‘유한 디리클레 분포’와 ‘무한 디리클레 분포’로 하는게 좋지 않았을까 생각해봄.

blog comments powered by Disqus

[1] ‘중국집 분포’라고 번역하면 되려나. ‘중국집 다중 분포’라고 하면 ‘중국집 분포’와 ‘중국집 다중분포’가 따로 존재하는것 같으니까 별로인거 같다. multivariate distribution은 ‘다변수 분포’라고들 하는것 같다.
생각해보니 이러면 안되는게, 디리클레의 경우, ‘디리클레 분포dirichlet distribution’와 ‘디리클레 다중분포process’라고 번역하면 더 헷갈린다. ‘디리클레분포’도 ‘다중분포(distribution over distributions[1])’이기 때문이다. 걍 프로세스로 하기로 ㅎㅎ. 디리클레는 차라리 ‘유한 디리클레 분포’와 ‘무한 디리클레 분포’로 하는게 좋지 않았을까 생각해봄.

[1]