"Numpy"의 두 판 사이의 차이

ph
이동: 둘러보기, 검색
잔글
 
74번째 줄: 74번째 줄:
 
write <code>dtype=object</code> when create an array.
 
write <code>dtype=object</code> when create an array.
 
  <nowiki>np.array([['']*3 for _ in range(wc)], dtype=object)</nowiki>
 
  <nowiki>np.array([['']*3 for _ in range(wc)], dtype=object)</nowiki>
 +
 +
==from string data to one hot vector==
 +
[https://stackoverflow.com/a/33010943/766330]
 +
<pre><nowiki>
 +
from sklearn.feature_extraction import DictVectorizer
 +
import pandas as pd
 +
 +
dv = DictVectorizer(sparse=False)
 +
df = pd.DataFrame(M).convert_objects(convert_numeric=True)
 +
dv.fit_transform(df.to_dict(orient='records'))
 +
 +
array([[ 5. ,  0.2,  1. ,  0. ,  1. ,  0. ],
 +
      [ 2. ,  1.3,  1. ,  0. ,  0. ,  1. ],
 +
      [ 1. ,  2.3,  0. ,  1. ,  0. ,  1. ]])
 +
</nowiki></pre>

2018년 2월 21일 (수) 17:58 기준 최신판

bincount

Count number of occurrences of each value in array of non-negative ints.

numpy.bincount(x, weights=None, minlength=None)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

loadtxt

numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)

https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
cf. fromstring

  • fromstring쓸 때, sep argument로 아무것도 넘겨주지 않으면 binary취급함에 주의. 탭구분자등은 sep=' '와 같이 공백만 주어도 된다.

histogram

numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)
>>> import matplotlib.pyplot as plt
>>> rng = np.random.RandomState(10)  # deterministic random data
>>> a = np.hstack((rng.normal(size=1000),
...                rng.normal(loc=5, scale=2, size=1000)))
>>> plt.hist(a, bins='auto')  # plt.hist passes it's arguments to np.histogram
>>> plt.title("Histogram with 'auto' bins")
>>> plt.show()

https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

Array to column vector

>>> a = np.array([1, 2, 3])
>>> a
array([1, 2, 3])
>>> a[:, np.newaxis]
array([[1],
       [2],
       [3]])
>>> a[np.newaxis, :]
array([[1, 2, 3]])
>>> np.newaxis is None # so you can use None instead of np.newaxis
True

http://stackoverflow.com/a/17428859/766330

>>> np.array([a]).T
array([[1],
       [2],
       [3]])

Get a distance matrix

scipy.spatial.distance.pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None)
X : ndarray

X is m by n matrix, and rows are observations. So X is m observations.
pdist means pairwise distance. From this, scipy.spatial.distance.squareform(X) can make the distance matrix.[1]

set_printoptions

Not to omit the part of a matrix.

>>> np.random.rand(100,100)
array([[ 0.46154546,  0.12353798,  0.27590724, ...,  0.24265687,
         0.84255677,  0.95283526],
       [ 0.73838516,  0.47949374,  0.23105863, ...,  0.08543431,
         0.91986747,  0.14417515],
       [ 0.31065035,  0.28328507,  0.29925302, ...,  0.79512885,
         0.09237567,  0.49872117],
       ..., 
       [ 0.63830484,  0.53113463,  0.22787907, ...,  0.41847976,
         0.42330993,  0.78735475],
       [ 0.94555611,  0.68517865,  0.82703527, ...,  0.84290377,
         0.75802783,  0.20678318],
       [ 0.42103587,  0.43982509,  0.42412681, ...,  0.04823858,
         0.94207207,  0.46931123]])
>>> np.set_printoptions(threshold=100000) # see `linewidth' also 
>>> np.random.rand(100,100)
array([[  8.62450822e-02,   3.64229303e-01,   5.15339939e-01,
          4.24720591e-01,   4.27696324e-02,   6.75689424e-01,
          2.69844754e-01,   2.78414489e-01,   5.24304684e-01,
          blablablabla...

numpy manual

array of arbitrary length strings

write dtype=object when create an array.

np.array([['']*3 for _ in range(wc)], dtype=object)

from string data to one hot vector

[2]

from sklearn.feature_extraction import DictVectorizer
import pandas as pd

dv = DictVectorizer(sparse=False) 
df = pd.DataFrame(M).convert_objects(convert_numeric=True)
dv.fit_transform(df.to_dict(orient='records'))

array([[ 5. ,  0.2,  1. ,  0. ,  1. ,  0. ],
       [ 2. ,  1.3,  1. ,  0. ,  0. ,  1. ],
       [ 1. ,  2.3,  0. ,  1. ,  0. ,  1. ]])