"Numpy"의 두 판 사이의 차이
ph
(새 문서: =={{c|bincount}}== Count number of occurrences of each value in array of non-negative ints. https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html =={c|loadtxt}}==...) |
잔글 |
||
(같은 사용자의 중간 판 16개는 보이지 않습니다) | |||
1번째 줄: | 1번째 줄: | ||
=={{c|bincount}}== | =={{c|bincount}}== | ||
Count number of occurrences of each value in array of non-negative ints. | Count number of occurrences of each value in array of non-negative ints. | ||
+ | numpy.bincount(x, weights=None, minlength=None) | ||
https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html | https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html | ||
− | =={c|loadtxt}}== | + | =={{c|loadtxt}}== |
numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0) | numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0) | ||
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html | https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html | ||
+ | <br>cf. [https://docs.scipy.org/doc/numpy/reference/generated/numpy.fromstring.html {{c|fromstring}} ] | ||
+ | * {{c|fromstring}}쓸 때, sep argument로 아무것도 넘겨주지 않으면 binary취급함에 주의. 탭구분자등은 <code>sep=' '</code>와 같이 공백만 주어도 된다. | ||
+ | |||
+ | =={{c|histogram}}== | ||
+ | numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None) | ||
+ | |||
+ | >>> import matplotlib.pyplot as plt | ||
+ | >>> rng = np.random.RandomState(10) # deterministic random data | ||
+ | >>> a = np.hstack((rng.normal(size=1000), | ||
+ | ... rng.normal(loc=5, scale=2, size=1000))) | ||
+ | >>> plt.hist(a, bins='auto') # plt.hist passes it's arguments to np.histogram | ||
+ | >>> plt.title("Histogram with 'auto' bins") | ||
+ | >>> plt.show() | ||
+ | https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html | ||
+ | |||
+ | ==Array to column vector== | ||
+ | <pre>>>> a = np.array([1, 2, 3]) | ||
+ | >>> a | ||
+ | array([1, 2, 3]) | ||
+ | >>> a[:, np.newaxis] | ||
+ | array([[1], | ||
+ | [2], | ||
+ | [3]]) | ||
+ | >>> a[np.newaxis, :] | ||
+ | array([[1, 2, 3]]) | ||
+ | >>> np.newaxis is None # so you can use None instead of np.newaxis | ||
+ | True</pre> | ||
+ | http://stackoverflow.com/a/17428859/766330 | ||
+ | <pre> | ||
+ | >>> np.array([a]).T | ||
+ | array([[1], | ||
+ | [2], | ||
+ | [3]])</pre> | ||
+ | |||
+ | ==Get a distance matrix== | ||
+ | scipy.spatial.distance.pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None) | ||
+ | X : ndarray | ||
+ | X is m by n matrix, and ''rows'' are observations. So X is ''m'' observations. | ||
+ | <br>{{c|pdist}} means ''pairwise distance''. From this, {{c|scipy.spatial.distance.squareform(X)}} can make the distance matrix.[https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.squareform.html#scipy.spatial.distance.squareform] | ||
+ | |||
+ | ==<c>set_printoptions</c>== | ||
+ | Not to omit the part of a matrix. | ||
+ | <pre>>>> np.random.rand(100,100) | ||
+ | array([[ 0.46154546, 0.12353798, 0.27590724, ..., 0.24265687, | ||
+ | 0.84255677, 0.95283526], | ||
+ | [ 0.73838516, 0.47949374, 0.23105863, ..., 0.08543431, | ||
+ | 0.91986747, 0.14417515], | ||
+ | [ 0.31065035, 0.28328507, 0.29925302, ..., 0.79512885, | ||
+ | 0.09237567, 0.49872117], | ||
+ | ..., | ||
+ | [ 0.63830484, 0.53113463, 0.22787907, ..., 0.41847976, | ||
+ | 0.42330993, 0.78735475], | ||
+ | [ 0.94555611, 0.68517865, 0.82703527, ..., 0.84290377, | ||
+ | 0.75802783, 0.20678318], | ||
+ | [ 0.42103587, 0.43982509, 0.42412681, ..., 0.04823858, | ||
+ | 0.94207207, 0.46931123]]) | ||
+ | >>> np.set_printoptions(threshold=100000) # see `linewidth' also | ||
+ | >>> np.random.rand(100,100) | ||
+ | array([[ 8.62450822e-02, 3.64229303e-01, 5.15339939e-01, | ||
+ | 4.24720591e-01, 4.27696324e-02, 6.75689424e-01, | ||
+ | 2.69844754e-01, 2.78414489e-01, 5.24304684e-01, | ||
+ | blablablabla...</pre> | ||
+ | [https://docs.scipy.org/doc/numpy/reference/generated/numpy.set_printoptions.html numpy manual] | ||
+ | |||
+ | ==array of arbitrary length strings== | ||
+ | write <code>dtype=object</code> when create an array. | ||
+ | <nowiki>np.array([['']*3 for _ in range(wc)], dtype=object)</nowiki> | ||
+ | |||
+ | ==from string data to one hot vector== | ||
+ | [https://stackoverflow.com/a/33010943/766330] | ||
+ | <pre><nowiki> | ||
+ | from sklearn.feature_extraction import DictVectorizer | ||
+ | import pandas as pd | ||
+ | |||
+ | dv = DictVectorizer(sparse=False) | ||
+ | df = pd.DataFrame(M).convert_objects(convert_numeric=True) | ||
+ | dv.fit_transform(df.to_dict(orient='records')) | ||
+ | |||
+ | array([[ 5. , 0.2, 1. , 0. , 1. , 0. ], | ||
+ | [ 2. , 1.3, 1. , 0. , 0. , 1. ], | ||
+ | [ 1. , 2.3, 0. , 1. , 0. , 1. ]]) | ||
+ | </nowiki></pre> |
2018년 2월 21일 (수) 17:58 기준 최신판
목차
bincount
Count number of occurrences of each value in array of non-negative ints.
numpy.bincount(x, weights=None, minlength=None)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
loadtxt
numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
cf. fromstring
- fromstring쓸 때, sep argument로 아무것도 넘겨주지 않으면 binary취급함에 주의. 탭구분자등은
sep=' '
와 같이 공백만 주어도 된다.
histogram
numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)
>>> import matplotlib.pyplot as plt >>> rng = np.random.RandomState(10) # deterministic random data >>> a = np.hstack((rng.normal(size=1000), ... rng.normal(loc=5, scale=2, size=1000))) >>> plt.hist(a, bins='auto') # plt.hist passes it's arguments to np.histogram >>> plt.title("Histogram with 'auto' bins") >>> plt.show()
https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html
Array to column vector
>>> a = np.array([1, 2, 3]) >>> a array([1, 2, 3]) >>> a[:, np.newaxis] array([[1], [2], [3]]) >>> a[np.newaxis, :] array([[1, 2, 3]]) >>> np.newaxis is None # so you can use None instead of np.newaxis True
http://stackoverflow.com/a/17428859/766330
>>> np.array([a]).T array([[1], [2], [3]])
Get a distance matrix
scipy.spatial.distance.pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None) X : ndarray
X is m by n matrix, and rows are observations. So X is m observations.
pdist means pairwise distance. From this, scipy.spatial.distance.squareform(X) can make the distance matrix.[1]
set_printoptions
Not to omit the part of a matrix.
>>> np.random.rand(100,100) array([[ 0.46154546, 0.12353798, 0.27590724, ..., 0.24265687, 0.84255677, 0.95283526], [ 0.73838516, 0.47949374, 0.23105863, ..., 0.08543431, 0.91986747, 0.14417515], [ 0.31065035, 0.28328507, 0.29925302, ..., 0.79512885, 0.09237567, 0.49872117], ..., [ 0.63830484, 0.53113463, 0.22787907, ..., 0.41847976, 0.42330993, 0.78735475], [ 0.94555611, 0.68517865, 0.82703527, ..., 0.84290377, 0.75802783, 0.20678318], [ 0.42103587, 0.43982509, 0.42412681, ..., 0.04823858, 0.94207207, 0.46931123]]) >>> np.set_printoptions(threshold=100000) # see `linewidth' also >>> np.random.rand(100,100) array([[ 8.62450822e-02, 3.64229303e-01, 5.15339939e-01, 4.24720591e-01, 4.27696324e-02, 6.75689424e-01, 2.69844754e-01, 2.78414489e-01, 5.24304684e-01, blablablabla...
array of arbitrary length strings
write dtype=object
when create an array.
np.array([['']*3 for _ in range(wc)], dtype=object)
from string data to one hot vector
from sklearn.feature_extraction import DictVectorizer import pandas as pd dv = DictVectorizer(sparse=False) df = pd.DataFrame(M).convert_objects(convert_numeric=True) dv.fit_transform(df.to_dict(orient='records')) array([[ 5. , 0.2, 1. , 0. , 1. , 0. ], [ 2. , 1.3, 1. , 0. , 0. , 1. ], [ 1. , 2.3, 0. , 1. , 0. , 1. ]])