0811 Affinity Propagation

message passing 방식

“Affinity Propagation attempts to find the exemplars that maximize the net similarity, i.e. the overall sum of similarities between all exemplars and their member data points.”^[1]

한문장으로 AP를 잘 설명한것 같아서 퍼왔다. Sparse matrix를 다루는 페이지에도 AP의 간략한 설명이 있으니 참고.

해당 논문^[2]을 보면, 내용 자체는 어렵지 않고, 실제로 사용할 수 있는 MATLAB코드도 Frey Lab에 공개되어 있다. 아래 코드와 수식은 대부분 해당 논문과 Frey Lab이 출처이고 이해를 돕기 위해 약간 수정하거나 간략한 주석을 추가했다. (링크한 것은 이해하기 쉬운 코드이고, Frey Lab에 가보면 더 깔끔하게 동작하는 코드를 얻을 수 있다.)

AP는 다음 세 식에 따라 동작한다.

$$\begin{equation} r(i,k) \leftarrow s(i,k) - \max_{k'\neq k} \{ a(i, k') + s(i, k') \} \end{equation}$$ $$\begin{equation} a(i,k) \leftarrow \min \Big\{0, r(k,k) + \sum_{i' \not\in \{i, k\}} \max\{0, r(i', k)\} \Big\} \end{equation}$$ $$\begin{equation} a(k,k) \leftarrow \sum_{i' \neq k} \max \{0, r(i', k) \} \end{equation}$$

(similiarity를 실수로 distance를 넣었을 때 결과가 상당히 특이했다. 좀 더 실험해봐야 할것 같다.)

AP는 $N^2$ size의 availability, responsibility matrix를 모두 메모리에 들고 있어야 해서, 아주 큰 데이터에 대해서는 돌려볼 수가 없다($N=$1M이고 similiarity하나당 4byte라고 했을 때, 1M×1M×4byte$\approx$232TB가 필요하다). 메모리문제를 약간 완화시키는 방법으로 <a href='leveraged_ap.html'>leveraged ap</a>가 있는데, 완화효과가 그리 크지 않고, cluster center의 개수에 영향을 줄 수 있어서 별로인것 같다. 예를 들어, frac을 작게 잡아서 column을 10개밖에 쓰지 못하게 되면 center의 개수도 10개 이하가 된다. row가 수백만인 경우도 흔하기 때문에 결국 좋은 방법이라고 하기는 힘들다. 이렇게 했을 때 결국 AP에 수렴하는지 보인 문서도 없는 듯 보인다.(대강 봐서는 수렴할것 같기는 하다)

Similiarity matirx가 sparse할 때도 사용할 수 있다. <a href='apcluster_sparse.html'>별도로 정리한 것</a>이 있다.

Fujiwara¹⁾에 위 식들의 간략한 버전이 나온다. 이해하기 더 쉬운것 같다. 표기만 약간 바꾸어 다시 적으면 다음과 같다.

$$ \begin{align} &r(i,j) = s(i,j) - \max_{k\neq j}\{a(i,k) + s(i,k)\} & (i \neq j) \\ &r(i,i) = s(i,i) - \max_{k\neq i}\{s(i,k)\} & \\ &a(i,j) = \min\left\{ 0, r(j,j) + \sum_{k\neq i,j} \max \{ 0, r(k,j) \} \right\} & (i \neq j ) \\ &a(i,i) = \sum_{k\neq i} \max\{ 0, r(k,i) \} \end{align} $$

좌변의 $a$, $r$과 우변의 $a$, $r$은 의미가 달라서(damping을 고려하지 않는다면 그냥 iteration전/후차이) Fujiwara에는 좌변이 $\rho$, $\alpha$로 표기되어 있다.

‘responsibility’와 ‘availability’에 관한 상세한 설명이 Frey에 나오지만, Fujiwara에 나온 간략한 설명도 좋은것 같다. $r(i,j)$는 $i$가 $j$에게 보내는 메시지이고, $a(i,j)$는 $j$가 $i$에게 보내는 메시지인데 의미는 모두 ‘$j$가 exemplar로 얼마나 적당하냐’ 하는 것이다($ \definecolor{blue}{RGB}{0,0,200} \color{blue} r(i\rightarrow j), a(i\leftarrow j) $로 적어도 될듯. s, r로 해서 send, receive가 더 나으려나 생각도 해봤는데 send, receive는 누구 기준해서 주고받는거냐가 헷갈려서 별로 도움 안될것 같다. $\overset{i\rightarrow j}M, \overset{i\leftarrow j}M$도 괜찮을것 같다). 그래서 나중에 exemplar를 고를 때도 둘의 합이 최대가 되는 곳($\arg\max \{ r(i,j)+a(i,j):j=1,2,\dots,N\} $)이 선택된다.

식(5)는 Frey 읽을때도 조금 이해가 안갔었다. Frey에는 식(1)아래에 다음과 같이 실려있다.

For $k = i$, the responsibility $r(k,k)$ is set to the input preference that point $k$ be chosen as an exemplar, $s(k,k)$, minus the largest of the similarities between point $i$ and all other candidate exemplars. This “self-responsibility” reflects accumulated evidence that point $k$ is an exemplar, based on its input preference tempered by how ill-suited it is to be assigned to another exemplar.

식(5)는 이 말을 식으로 옮겨둔 것으로 보이는데, 정작 Frey Lab의 코드(<a href='ap_code.html'>위의 링크</a>에 있는 코드)를 보면 이 부분이 없다. 즉 $r(i,k)$는 $i$와 $k$가 같든 다르든 식(1)에 따라 갱신된다. 만약 $r$이 매번 저렇게 갱신되면 $\lambda$가 0일 때 $r(k,k)$가 변화가 없어야 하지만 실제로는 그렇지 않다. 따라서 (5)는 $a(i,j)=0$인 초기조건에서 식(4)의 $k=i$일 때인것 같다. Fray의 논문에 나온 부분도 초기조건 설명이 아닌가 한다. (억지로 이렇게 끼워맞춰 생각해보지만 속시원한 느낌이 없기는 매한가지)

(작성중)

spiral에 대한 성능은?

↑ Fujiwara, Yasuhiro, Go Irie, and Tomoe Kitahara. “Fast algorithm for affinity propagation.” IJCAI Proceedings-International Joint Conference on Artificial Intelligence. Vol. 22. No. 3. 2011.
↑ Frey, Brendan J., and Delbert Dueck. “Clustering by passing messages between data points.” science 315.5814 (2007): 972-976.

[1] Fujiwara, Yasuhiro, Go Irie, and Tomoe Kitahara. “Fast algorithm for affinity propagation.” IJCAI Proceedings-International Joint Conference on Artificial Intelligence. Vol. 22. No. 3. 2011.

[2] Frey, Brendan J., and Delbert Dueck. “Clustering by passing messages between data points.” science 315.5814 (2007): 972-976.

[1]

[2]