"Dirichlet process"의 두 판 사이의 차이

2017년 6월 21일 (수) 10:36 기준 최신판

(Almost all is from wiki.)

정의

Dirichlet process is a probability distribution whose range is itself a set of probability distributions.
(‘process is a distribution’)

The Dirichlet process can be seen as the infinite-dimensional generalization of the Dirichlet distribution.

성질

Dirichlet distribution이 categorical distribution의 conjugate prior이듯, Dirichlet process도 infinite, nonparametric discrete distribution의 conjugate prior가 된다.

대표적으로 infinite mixture model의 prior probability distribution으로 응용 가능하다.

Chinese Restaurant Process(CRP)^[1]의 visualization[2]을 보면 정말 기가 막힌데, 설명도 좋다.

samples (from the base measure H) yields a random sample of the Dirichlet process DP(0.5,H)

샘플링 해보면 ‘richer get richer’ fashion으로 뽑힌다. 그 과정은 아래와 같이 simulation해볼 수 있다.

input: \(H\) (a probability distribution called base distribution), \(α\) (a positive real number called scaling parameter)

Draw \(X_{1}\) from the distribution \(H\).
for \(n>1\):
1. With probability \(\displaystyle{\frac {\alpha }{\alpha +n-1}} \) draw \(X_n\) from \(H\).
2. With probability \(\displaystyle{\frac {n_{x}}{\alpha +n-1}} \) set \(X_n = x\), where \(n_x\) is the number of previous observations \(X_j\, ,\, j<n\) , such that \(X_j = x\).

여기서 \(x\)는 (cluster의) label이라고 생각하면 됨.
\(\sum n_x = n -1 \)이기 때문에 \( \displaystyle \sum_x \left( \frac{\alpha }{\alpha +n-1} + \frac{n_{x}}{\alpha +n-1} \right) = 1\)

\(X_i\)는 앞의 draw에 의존한다(=independent하지 않다). 하지만 exchangeable하다.(계산으로 보일 수 있다고 하나 위키에 안나오고 직접도 안해봄) 이때문에 de Finetti's theorem에 의해 ‘어떤 분포 \(P\)가 given이라고 하면 \(X_i\)가 서로 conditionally independent함’을 보일 수 있다. 따라서 위의 과정은 아래와 동일하다.

Draw a distribution \(P\) from DP\((H, \alpha) \)
Draw observations \(X_1, X_2, \cdots\) independently from \(P\).

기타

위키페이지 맨 끝 external links section에 좋은 문서들이 많이 링크되어 있는 것 같다.

왠지 좋아보이는 pdf(레퍼런스 빼면 11페이지)도 찾았지만 과연 읽을날이 올것인가.

한글로 된 블로그(Mad for Simplicity) 글중에 DPGMM(Dirichlet Process Gaussian Mixture Models) 예제코드가 있다. DP와의 차이점은,

DP는 \(X_i\)를 하나씩 넣으면서 \(x\)를 assign하지만, 처음부터 모든 \(X_i\)에 초기값(클러스터 숫자(k)를 한개로 시작하려면 모두 1, 아니면 초기 클러스터 숫자를 랜덤하게 배분. 예를들어 k=3에서 시작하려면 1,2,3을 랜덤하게 배분.)을 배당한다.
새로운 클러스터 생성확률을 \(\displaystyle \max[\text{p}(j)] \times \frac{\alpha}{n -1 + \alpha} \) where \(\displaystyle \text{p}(j) = \text{p}'(j) \frac{n_j}{n -1 + \alpha} \)로 둔다. 여기서 p\('(j)\)를 matlab code로 나타내면,
```
mvnpdf(cur_pt, center{j}, diag([sigma2_x, sigma2_y]))
```
. 즉, basic한 DP에서 쓰는 확률값에 적당한 scaling factor를 곱한 것을 최종 확률로 쓴다. 새로운 점이 할당될때마다 기존 데이터가 변하므로 iteration을 여러번 돌아야 한다(basic DP는 그렇지 않다. 한번 할당하면 fix된다)
확률을 위처럼 두면 합이 1이 되나 궁금하긴 한데 안찾아봤음.
For mvnpdf, refer mathworks.

↑ ‘중국집 분포’라고 번역하면 되려나. ‘중국집 다중 분포’라고 하면 ‘중국집 분포’와 ‘중국집 다중분포’가 따로 존재하는것 같으니까 별로인거 같다. multivariate distribution은 ‘다변수 분포’라고들 하는것 같다.
생각해보니 이러면 안되는게, 디리클레의 경우, ‘디리클레 분포dirichlet distribution’와 ‘디리클레 다중분포process’라고 번역하면 더 헷갈린다. ‘디리클레분포’도 ‘다중분포(distribution over distributions[1])’이기 때문이다. 걍 프로세스로 하기로 ㅎㅎ. 디리클레는 차라리 ‘유한 디리클레 분포’와 ‘무한 디리클레 분포’로 하는게 좋지 않았을까 생각해봄.

blog comments powered by Disqus

[1] ‘중국집 분포’라고 번역하면 되려나. ‘중국집 다중 분포’라고 하면 ‘중국집 분포’와 ‘중국집 다중분포’가 따로 존재하는것 같으니까 별로인거 같다. multivariate distribution은 ‘다변수 분포’라고들 하는것 같다.
생각해보니 이러면 안되는게, 디리클레의 경우, ‘디리클레 분포dirichlet distribution’와 ‘디리클레 다중분포process’라고 번역하면 더 헷갈린다. ‘디리클레분포’도 ‘다중분포(distribution over distributions[1])’이기 때문이다. 걍 프로세스로 하기로 ㅎㅎ. 디리클레는 차라리 ‘유한 디리클레 분포’와 ‘무한 디리클레 분포’로 하는게 좋지 않았을까 생각해봄.

[1]

@@ 35번째 줄: / 35번째 줄: @@
 ===기타===
 [https://en.wikipedia.org/wiki/Dirichlet_process#External_links 위키페이지 맨 끝 external links section]에 좋은 문서들이 많이 링크되어 있는 것 같다.
+[http://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/dp.pdf 왠지 좋아보이는 pdf](레퍼런스 빼면 11페이지)도 찾았지만 과연 읽을날이 올것인가.
 [http://enginius.tistory.com/397 한글로 된 블로그(Mad for Simplicity) 글]중에 [[DPGMM]](Dirichlet Process Gaussian Mixture Models) 예제코드가 있다. DP와의 차이점은,