Jaccard vs cityblock
ph
>>> b =np.array([[1,0,1,1,1],[1,0,0,1,1]]).T
>>> b
array([[1, 1],
[0, 0],
[1, 0],
[1, 1],
[1, 1]])
>>> ss.distance.squareform(ss.distance.pdist(b))
array([[ 0. , 1.41421356, 1. , 0. , 0. ],
[ 1.41421356, 0. , 1. , 1.41421356, 1.41421356],
[ 1. , 1. , 0. , 1. , 1. ],
[ 0. , 1.41421356, 1. , 0. , 0. ],
[ 0. , 1.41421356, 1. , 0. , 0. ]])
>>> ss.distance.squareform(ss.distance.pdist(b, 'cityblock'))
array([[ 0., 2., 1., 0., 0.],
[ 2., 0., 1., 2., 2.],
[ 1., 1., 0., 1., 1.],
[ 0., 2., 1., 0., 0.],
[ 0., 2., 1., 0., 0.]])
>>> ss.distance.squareform(ss.distance.pdist(b, 'jaccard'))
array([[ 0. , 1. , 0.5, 0. , 0. ],
[ 1. , 0. , 1. , 1. , 1. ],
[ 0.5, 1. , 0. , 0.5, 0.5],
[ 0. , 1. , 0.5, 0. , 0. ],
[ 0. , 1. , 0.5, 0. , 0. ]])
jaccard는 둘 다 0일때 무시한다. intersection=and, union=or 이기 때문. jaccard=intersection/union