Dirichlet distribution

\(\operatorname {Dir} ({\boldsymbol {\alpha }})\) is a family of continuous multivariate probability distributions parameterized by a vector \(\boldsymbol {\alpha }\) of positive reals.

\(\alpha\)가 vector가 아니면 \(\alpha_i\)가 모두 같다는 뜻인데, 이런건 symmetric Dirichlet distribution이라고 하고[1], 이 때 \(\alpha\)를 concentration parameter라고 한다. concentration parameter = 1이면, open standard K-1 simplex에서 uniform distribution이 되는데, 이를 flat Dirichlet distribution이라고 부른다. Values of the concentration parameter above 1 prefer variates that are dense, evenly distributed distributions, i.e. all the values within a single sample are similar to each other. Values of the concentration parameter below 1 prefer sparse distributions, i.e. most of the values within a single sample will be close to 0, and the vast majority of the mass will be concentrated in a few of the values.

probability density function \(f\),

\(\displaystyle f\left(x_{1},\ldots ,x_{K-1};\alpha _{1},\ldots ,\alpha _{K}\right)={\frac {1}{\mathrm {B} ({\boldsymbol {\alpha }})}}\prod _{i=1}^{K}x_{i}^{\alpha _{i}-1},\) where \(\displaystyle x_{K}=1-\sum \limits _{i=1}^{K-1}x_{i}\)
\(\displaystyle \lbrace {x_{1},\cdots ,x_{K}\rbrace }\) belongs to the standard simplex.

The normalizing constant is the multivariate Beta function, \(\displaystyle \mathrm {B} ({\boldsymbol {\alpha }})={\frac {\prod _{i=1}^{K}\Gamma (\alpha _{i})}{\Gamma \left(\sum _{i=1}^{K}\alpha _{i}\right)}},\qquad {\boldsymbol {\alpha }}=(\alpha _{1},\ldots ,\alpha _{K}).\)

이거 계산해내는 과정은 간단하지 않은듯.

Symmetric Dirichlet distribution의 경우,

\( f(x_{1},\dots ,x_{K-1};\alpha )={\frac {\Gamma (\alpha K)}{\Gamma (\alpha )^{K}}}\prod _{i=1}^{K}x_{i}^{\alpha -1} \)

The marginal distributions are beta distributions:
\(\displaystyle X_{i}\sim \operatorname {Beta} (\alpha _{i},\alpha _{0}-\alpha _{i}).\)

binomial과 beta distribution의 관계가 multinomial과 dirichlet distribution의 관계와 같다.

Conjugate to categorical/multinomial이기 때문에,[2]
Given a model
\({\displaystyle {\begin{array}{rcccl}{\boldsymbol {\alpha }}&=&\left(\alpha _{1},\ldots ,\alpha _{K}\right)&=&{\text{concentration hyperparameter}}\\\mathbf {p} \mid {\boldsymbol {\alpha }}&=&\left(p_{1},\ldots ,p_{K}\right)&\sim &\operatorname {Dir} (K,{\boldsymbol {\alpha }})\\\mathbb {X} \mid \mathbf {p} &=&\left(\mathbf {x} _{1},\ldots ,\mathbf {x} _{K}\right)&\sim &\operatorname {Cat} (K,\mathbf {p} )\end{array}}}\)
then the following holds,
\({\displaystyle {\begin{array}{rcccl}\mathbf {c} &=&\left(c_{1},\ldots ,c_{K}\right)&=&{\text{number of occurrences of category }}i\\\mathbf {p} \mid \mathbb {X} ,{\boldsymbol {\alpha }}&\sim &\operatorname {Dir} (K,\mathbf {c} +{\boldsymbol {\alpha }})&=&\operatorname {Dir} \left(K,c_{1}+\alpha _{1},\ldots ,c_{K}+\alpha _{K}\right)\end{array}}}\)