"Numpy"의 두 판 사이의 차이
ph
잔글 |
|||
74번째 줄: | 74번째 줄: | ||
write <code>dtype=object</code> when create an array. | write <code>dtype=object</code> when create an array. | ||
<nowiki>np.array([['']*3 for _ in range(wc)], dtype=object)</nowiki> | <nowiki>np.array([['']*3 for _ in range(wc)], dtype=object)</nowiki> | ||
+ | |||
+ | ==from string data to one hot vector== | ||
+ | [https://stackoverflow.com/a/33010943/766330] | ||
+ | <pre><nowiki> | ||
+ | from sklearn.feature_extraction import DictVectorizer | ||
+ | import pandas as pd | ||
+ | |||
+ | dv = DictVectorizer(sparse=False) | ||
+ | df = pd.DataFrame(M).convert_objects(convert_numeric=True) | ||
+ | dv.fit_transform(df.to_dict(orient='records')) | ||
+ | |||
+ | array([[ 5. , 0.2, 1. , 0. , 1. , 0. ], | ||
+ | [ 2. , 1.3, 1. , 0. , 0. , 1. ], | ||
+ | [ 1. , 2.3, 0. , 1. , 0. , 1. ]]) | ||
+ | </nowiki></pre> |
2018년 2월 21일 (수) 17:58 기준 최신판
목차
bincount
Count number of occurrences of each value in array of non-negative ints.
numpy.bincount(x, weights=None, minlength=None)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html
loadtxt
numpy.loadtxt(fname, dtype=<type 'float'>, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0)
https://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html
cf. fromstring
- fromstring쓸 때, sep argument로 아무것도 넘겨주지 않으면 binary취급함에 주의. 탭구분자등은
sep=' '
와 같이 공백만 주어도 된다.
histogram
numpy.histogram(a, bins=10, range=None, normed=False, weights=None, density=None)
>>> import matplotlib.pyplot as plt >>> rng = np.random.RandomState(10) # deterministic random data >>> a = np.hstack((rng.normal(size=1000), ... rng.normal(loc=5, scale=2, size=1000))) >>> plt.hist(a, bins='auto') # plt.hist passes it's arguments to np.histogram >>> plt.title("Histogram with 'auto' bins") >>> plt.show()
https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html
Array to column vector
>>> a = np.array([1, 2, 3]) >>> a array([1, 2, 3]) >>> a[:, np.newaxis] array([[1], [2], [3]]) >>> a[np.newaxis, :] array([[1, 2, 3]]) >>> np.newaxis is None # so you can use None instead of np.newaxis True
http://stackoverflow.com/a/17428859/766330
>>> np.array([a]).T array([[1], [2], [3]])
Get a distance matrix
scipy.spatial.distance.pdist(X, metric='euclidean', p=None, w=None, V=None, VI=None) X : ndarray
X is m by n matrix, and rows are observations. So X is m observations.
pdist means pairwise distance. From this, scipy.spatial.distance.squareform(X) can make the distance matrix.[1]
set_printoptions
Not to omit the part of a matrix.
>>> np.random.rand(100,100) array([[ 0.46154546, 0.12353798, 0.27590724, ..., 0.24265687, 0.84255677, 0.95283526], [ 0.73838516, 0.47949374, 0.23105863, ..., 0.08543431, 0.91986747, 0.14417515], [ 0.31065035, 0.28328507, 0.29925302, ..., 0.79512885, 0.09237567, 0.49872117], ..., [ 0.63830484, 0.53113463, 0.22787907, ..., 0.41847976, 0.42330993, 0.78735475], [ 0.94555611, 0.68517865, 0.82703527, ..., 0.84290377, 0.75802783, 0.20678318], [ 0.42103587, 0.43982509, 0.42412681, ..., 0.04823858, 0.94207207, 0.46931123]]) >>> np.set_printoptions(threshold=100000) # see `linewidth' also >>> np.random.rand(100,100) array([[ 8.62450822e-02, 3.64229303e-01, 5.15339939e-01, 4.24720591e-01, 4.27696324e-02, 6.75689424e-01, 2.69844754e-01, 2.78414489e-01, 5.24304684e-01, blablablabla...
array of arbitrary length strings
write dtype=object
when create an array.
np.array([['']*3 for _ in range(wc)], dtype=object)
from string data to one hot vector
from sklearn.feature_extraction import DictVectorizer import pandas as pd dv = DictVectorizer(sparse=False) df = pd.DataFrame(M).convert_objects(convert_numeric=True) dv.fit_transform(df.to_dict(orient='records')) array([[ 5. , 0.2, 1. , 0. , 1. , 0. ], [ 2. , 1.3, 1. , 0. , 0. , 1. ], [ 1. , 2.3, 0. , 1. , 0. , 1. ]])