搭建模型第一步：這里有所有你需要預(yù)習(xí)的NumPy基礎(chǔ)

2018-07-04 來源：raincent

容器云強(qiáng)勢(shì)上線！快速搭建集群，上萬Linux鏡像隨意使用

NumPy 是一個(gè)為 Python 提供高性能向量、矩陣和高維數(shù)據(jù)結(jié)構(gòu)的科學(xué)計(jì)算庫。它通過 C 和 Fortran 實(shí)現(xiàn)，因此用向量和矩陣建立方程并實(shí)現(xiàn)數(shù)值計(jì)算有非常好的性能。NumPy 基本上是所有使用 Python 進(jìn)行數(shù)值計(jì)算的框架和包的基礎(chǔ)，例如 TensorFlow 和 PyTorch，構(gòu)建機(jī)器學(xué)習(xí)模型最基礎(chǔ)的內(nèi)容就是學(xué)會(huì)使用 NumPy 搭建計(jì)算過程。

基礎(chǔ)知識(shí)

NumPy 主要的運(yùn)算對(duì)象為同質(zhì)的多維數(shù)組，即由同一類型元素（一般是數(shù)字）組成的表格，且所有元素通過正整數(shù)元組進(jìn)行索引。在 NumPy 中，維度 (dimension) 也被稱之為軸線（axes)。

比如坐標(biāo)點(diǎn) [1, 2, 1] 有一個(gè)軸線。這個(gè)軸上有 3 個(gè)點(diǎn)，所以我們說它的長度（length）為 3。而如下數(shù)組（array）有 2 個(gè)軸線，長度同樣為 3。

[[ 1., 0., 0.],
[ 0., 1., 2.]]

NumPy 的數(shù)組類（array class）叫做 ndarray，同時(shí)我們也常稱其為數(shù)組（array）。注意 numpy.array 和標(biāo)準(zhǔn) Python 庫中的類 array.array 是不同的。標(biāo)準(zhǔn) Python 庫中的類 array.array 只處理一維的數(shù)組，提供少量的功能。ndarray 還具有如下很多重要的屬性：

• ndarray.ndim：顯示數(shù)組的軸線數(shù)量（或維度）。

• ndarray.shape：顯示在每個(gè)維度里數(shù)組的大小。如 n 行 m 列的矩陣，它的 shape 就是（n,m)。

>>> b = np.array([[1,2,3],[4,5,6]])
>>> b.shape
(2, 3)

• ndarray.size：數(shù)組中所有元素的總量，相當(dāng)于數(shù)組的 shape 中所有元素的乘積，例如矩陣的元素總量為行與列的乘積。

>>> b = np.array([[1,2,3],[4,5,6]])
>>> b.size
6

• ndarray.dtype：顯示數(shù)組元素的類型。Python 中的標(biāo)準(zhǔn) type 函數(shù)同樣可以用于顯示數(shù)組類型，NumPy 有它自己的類型如：numpy.int32, numpy.int16, 和 numpy.float64，其中「int」和「float」代表數(shù)據(jù)的種類是整數(shù)還是浮點(diǎn)數(shù)，「32」和「16」代表這個(gè)數(shù)組的字節(jié)數(shù)（存儲(chǔ)大小）。

• ndarray.itemsize：數(shù)組中每個(gè)元素的字節(jié)存儲(chǔ)大小。例如元素類型為 float64 的數(shù)組，其 itemsize 為 8（=64/8）。

>>> import numpy as np
>>> a = np.arange(15).reshape(3, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int64'
>>> a.itemsize
8
>>> a.size
15
>>> type(a)
<type 'numpy.ndarray'>
>>> b = np.array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
<type 'numpy.ndarray'>

創(chuàng)建數(shù)組

NumPy 有很多種創(chuàng)建數(shù)組的方法。比如，你可以用 Python 的列表（list）來創(chuàng)建 NumPy 數(shù)組，其中生成的數(shù)組元素類型與原序列相同。

>>> import numpy as np
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> a.dtype
dtype('int64')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')

一個(gè)常見的誤差（error）在于調(diào)用 array 時(shí)使用了多個(gè)數(shù)值參數(shù)，而正確的方法應(yīng)該是用「[]」來定義一個(gè)列表的數(shù)值而作為數(shù)組的一個(gè)參數(shù)。

>>> a = np.array(1,2,3,4) # WRONG
>>> a = np.array([1,2,3,4]) # RIGHT

array 將序列中的序列轉(zhuǎn)換為二維的數(shù)組，序列中的序列中的序列轉(zhuǎn)換為三維數(shù)組，以此類推。

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> b
array([[ 1.5, 2. , 3. ],
[ 4. , 5. , 6. ]])

數(shù)組的類型也可以在創(chuàng)建時(shí)指定清楚：

>>> b = np.array([(1.5,2,3), (4,5,6)])
>>> c = np.array( [ [1,2], [3,4] ], dtype=complex )
>>> c
array([[ 1.+0.j, 2.+0.j],
[ 3.+0.j, 4.+0.j]])

一般數(shù)組的內(nèi)部元素初始是未知的，但它的大小是已知的。因此，NumPy 提供了一些函數(shù)可以創(chuàng)建有初始數(shù)值的占位符數(shù)組，這樣可以減少不必要的數(shù)組增長及運(yùn)算成本。

函數(shù) zeros 可創(chuàng)建一個(gè)內(nèi)部元素全是 0 的數(shù)組，函數(shù) ones 可創(chuàng)建一個(gè)內(nèi)部元素全是 1 的數(shù)組，函數(shù) empty 可創(chuàng)建一個(gè)初始元素為隨機(jī)數(shù)的數(shù)組，具體隨機(jī)量取決于內(nèi)存狀態(tài)。默認(rèn)狀態(tài)下，創(chuàng)建數(shù)組的數(shù)據(jù)類型（dtype）一般是 float64。

>>> np.zeros( (3,4) )
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
>>> np.ones( (2,3,4), dtype=np.int16 ) # dtype can also be specified
array([[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]],
[[ 1, 1, 1, 1],
[ 1, 1, 1, 1],
[ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) ) # uninitialized, output may vary
array([[ 3.73603959e-262, 6.02658058e-154, 6.55490914e-260],
[ 5.30498948e-313, 3.14673309e-307, 1.00000000e+000]])

為了創(chuàng)建數(shù)列，NumPy 提供一個(gè)與 range 類似的函數(shù)來創(chuàng)建數(shù)組：arange。

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 ) # it accepts float arguments
array([ 0. , 0.3, 0.6, 0.9, 1.2, 1.5, 1.8])

當(dāng) arange 使用浮點(diǎn)型參數(shù)時(shí)，因?yàn)楦↑c(diǎn)精度的有限性，arange 不能判斷有需要?jiǎng)?chuàng)建的數(shù)組多少個(gè)元素。在這種情況下，換成 linspace 函數(shù)可以更好地確定區(qū)間內(nèi)到底需要產(chǎn)生多少個(gè)數(shù)組元素。

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 ) # 9 numbers from 0 to 2
array([ 0. , 0.25, 0.5 , 0.75, 1. , 1.25, 1.5 , 1.75, 2. ])
>>> x = np.linspace( 0, 2*pi, 100 ) # useful to evaluate function at lots of points
>>> f = np.sin(x)

array, zeros, zeros_like, ones, ones_like, empty, empty_like, arange, linspace, numpy.random.rand, numpy.random.randn, fromfunction, fromfile （這些函數(shù)也可以創(chuàng)建數(shù)組，有時(shí)間可以嘗試解釋）

輸出數(shù)組

當(dāng)你輸出一個(gè)數(shù)組時(shí)，NumPy 顯示這個(gè)數(shù)組的方式和嵌套列表是相似的。但將數(shù)組打印到屏幕需要遵守以下布局：

• 最后一個(gè)軸由左至右打印

• 倒數(shù)第二個(gè)軸為從上到下打印

• 其余的軸都是從上到下打印，且每一塊之間都通過一個(gè)空行分隔

如下所示，一維數(shù)組輸出為一行、二維為矩陣、三維為矩陣列表。

>>> a = np.arange(6) # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3) # 2d array
>>> print(b)
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4) # 3d array
>>> print(c)
[[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]]
[[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]]

上述使用的 reshape 函數(shù)可指定數(shù)組的行列數(shù)，并將所有元素按指定的維度數(shù)排列，詳細(xì)介紹請(qǐng)看后面章節(jié)。在數(shù)組的打印中，如果一個(gè)數(shù)組所含元素?cái)?shù)太大，NumPy 會(huì)自動(dòng)跳過數(shù)組的中間部分，只輸出兩邊。

>>> print(np.arange(10000))
[ 0 1 2 ..., 9997 9998 9999]
>>>
>>> print(np.arange(10000).reshape(100,100))
[[ 0 1 2 ..., 97 98 99]
[ 100 101 102 ..., 197 198 199]
[ 200 201 202 ..., 297 298 299]
...,
[9700 9701 9702 ..., 9797 9798 9799]
[9800 9801 9802 ..., 9897 9898 9899]
[9900 9901 9902 ..., 9997 9998 9999]]

如果想要 NumPy 輸出整個(gè)數(shù)組，你可以用 set_printoptions 改變輸出設(shè)置。

>>> np.set_printoptions(threshold=np.nan)

基礎(chǔ)運(yùn)算

數(shù)組中的算術(shù)運(yùn)算一般是元素級(jí)的運(yùn)算，運(yùn)算結(jié)果會(huì)產(chǎn)生一個(gè)新的數(shù)組。如下所示減法、加法、平方、對(duì)應(yīng)元素乘積和邏輯運(yùn)算都是元素級(jí)的操作。

>>> a = np.array( [20,30,40,50] )
>>> b = np.arange( 4 )
>>> b
array([0, 1, 2, 3])
>>> c = a-b
>>> c
array([20, 29, 38, 47])
>>> b**2
array([0, 1, 4, 9])
>>> 10*np.sin(a)
array([ 9.12945251, -9.88031624, 7.4511316 , -2.62374854])
>>> a<35
array([ True, True, False, False])

不同于許多科學(xué)計(jì)算語言，乘法算子 * 或 multiple 函數(shù)在 NumPy 數(shù)組中用于元素級(jí)的乘法運(yùn)算，矩陣乘法可用 dot 函數(shù)或方法來執(zhí)行。

>>> A = np.array( [[1,1],
... [0,1]] )
>>> B = np.array( [[2,0],
... [3,4]] )
>>> A*B # elementwise product
array([[2, 0],
[0, 4]])
>>> A.dot(B) # matrix product
array([[5, 4],
[3, 4]])
>>> np.dot(A, B) # another matrix product
array([[5, 4],
[3, 4]])

有一些操作，如 += 和 *=，其輸出結(jié)果會(huì)改變一個(gè)已存在的數(shù)組，而不是如上述運(yùn)算創(chuàng)建一個(gè)新數(shù)組。

>>> a = np.ones((2,3), dtype=int)
>>> b = np.random.random((2,3))
>>> a *= 3
>>> a
array([[3, 3, 3],
[3, 3, 3]])
>>> b += a
>>> b
array([[ 3.417022 , 3.72032449, 3.00011437],
[ 3.30233257, 3.14675589, 3.09233859]])
>>> a += b # b is not automatically converted to integer type
Traceback (most recent call last):
...
TypeError: Cannot cast ufunc add output from dtype('float64') to dtype('int64') with casting rule 'same_kind'

當(dāng)操作不同數(shù)據(jù)類型的數(shù)組時(shí)，最后輸出的數(shù)組類型一般會(huì)與更普遍或更精準(zhǔn)的數(shù)組相同（這種行為叫做 Upcasting）。

>>> a = np.ones(3, dtype=np.int32)
>>> b = np.linspace(0,pi,3)
>>> b.dtype.name
'float64'
>>> c = a+b
>>> c
array([ 1. , 2.57079633, 4.14159265])
>>> c.dtype.name
'float64'
>>> d = np.exp(c*1j)
>>> d
array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
-0.54030231-0.84147098j])
>>> d.dtype.name
'complex128'

許多一元運(yùn)算，如計(jì)算數(shù)組中所有元素的總和，是屬于 ndarray 類的方法。

>>> a = np.random.random((2,3))
>>> a
array([[ 0.18626021, 0.34556073, 0.39676747],
[ 0.53881673, 0.41919451, 0.6852195 ]])
>>> a.sum()
2.5718191614547998
>>> a.min()
0.1862602113776709
>>> a.max()
0.6852195003967595

默認(rèn)狀態(tài)下，這些運(yùn)算會(huì)把數(shù)組視為一個(gè)數(shù)列而不論它的 shape。然而，如果在指定 axis 參數(shù)下，你可以指定針對(duì)哪一個(gè)維度進(jìn)行運(yùn)算。如下 axis=0 將針對(duì)每一個(gè)列進(jìn)行運(yùn)算，例如 b.sum(axis=0) 將矩陣 b 中每一個(gè)列的所有元素都相加為一個(gè)標(biāo)量。

>>> b = np.arange(12).reshape(3,4)
>>> b
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> b.sum(axis=0) # sum of each column
array([12, 15, 18, 21])
>>>
>>> b.min(axis=1) # min of each row
array([0, 4, 8])
>>>
>>> b.cumsum(axis=1) # cumulative sum along each row
array([[ 0, 1, 3, 6],
[ 4, 9, 15, 22],
[ 8, 17, 27, 38]])

索引、截取和迭代

一維數(shù)組可以被索引、截�。⊿licing）和迭代，就像 Python 列表和元組一樣。注意其中 a[0:6:2] 表示從第 1 到第 6 個(gè)元素，并對(duì)每?jī)蓚€(gè)中的第二個(gè)元素進(jìn)行操作。

>>> a = np.arange(10)**3
>>> a
array([ 0, 1, 8, 27, 64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000 # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000, 1, -1000, 27, -1000, 125, 216, 343, 512, 729])
>>> a[ : :-1] # reversed a
array([ 729, 512, 343, 216, 125, -1000, 27, -1000, 1, -1000])
>>> for i in a:
... print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0

多維數(shù)組每個(gè)軸都可以有一個(gè)索引。這些索引在元組中用逗號(hào)分隔：

>>> def f(x,y):
... return 10*x+y
...
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0, 1, 2, 3],
[10, 11, 12, 13],
[20, 21, 22, 23],
[30, 31, 32, 33],
[40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1] # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1] # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ] # each column in the second and third row of b
array([[10, 11, 12, 13],
[20, 21, 22, 23]])

當(dāng)有些維度沒有指定索引時(shí)，空缺的維度被默認(rèn)為取所有元素。

>>> b[-1] # the last row. Equivalent to b[-1,:]
array([40, 41, 42, 43])

如上因?yàn)槭÷粤说诙S，b[i] 表示輸出第 i 行。當(dāng)然我們也可以用「:」表示省略的維度，例如 b[i] 等價(jià)于 b[i, :]。此外，NumPy 還允許使用 dots (...) 表示足夠多的冒號(hào)來構(gòu)建完整的索引元組。

比如，如果 x 是 5 維數(shù)組：

• x[1,2,...] 等于 x[1,2,:,:,:],
• x[...,3] 等于 x[:,:,:,:,3]
• x[4,...,5,:] 等于 x[4,:,:,5,:]

>>> c = np.array( [[[ 0, 1, 2], # a 3D array (two stacked 2D arrays)
... [ 10, 12, 13]],
... [[100,101,102],
... [110,112,113]]])
>>> c.shape
(2, 2, 3)
>>> c[1,...] # same as c[1,:,:] or c[1]
array([[100, 101, 102],
[110, 112, 113]])
>>> c[...,2] # same as c[:,:,2]
array([[ 2, 13],
[102, 113]])

多維數(shù)組中的迭代以第一條軸為參照完成，如下每一次循環(huán)都輸出一個(gè) b[i]：

>>> for row in b:
... print(row)
...
[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]

然而，如果想在數(shù)組的每個(gè)元素上進(jìn)行操作，可以用 flat 方法。flat 是一個(gè)在數(shù)組所有元素中運(yùn)算的迭代器，如下將逐元素地對(duì)數(shù)組進(jìn)行操作。

>>> for element in b.flat:
... print(element)
...
0
1
2
3
10
11
12
13
20
21
22
23
30
31
32
33
40
41
42
43

Shape 變換

改變數(shù)組的 shape

一個(gè)數(shù)組的 shape 是由軸及其元素?cái)?shù)量決定的，它一般由一個(gè)整型元組表示，且元組中的整數(shù)表示對(duì)應(yīng)維度的元素?cái)?shù)。

>>> a = np.floor(10*np.random.random((3,4)))
>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.shape
(3, 4)

一個(gè)數(shù)組的 shape 可以由許多方法改變。例如以下三種方法都可輸出一個(gè)改變 shape 后的新數(shù)組，它們都不會(huì)改變?cè)瓟?shù)組。其中 reshape 方法在實(shí)踐中會(huì)經(jīng)常用到，因?yàn)槲覀冃枰淖償?shù)組的維度以執(zhí)行不同的運(yùn)算。

>>> a.ravel() # returns the array, flattened
array([ 2., 8., 0., 6., 4., 5., 1., 1., 8., 9., 3., 6.])
>>> a.reshape(6,2) # returns the array with a modified shape
array([[ 2., 8.],
[ 0., 6.],
[ 4., 5.],
[ 1., 1.],
[ 8., 9.],
[ 3., 6.]])
>>> a.T # returns the array, transposed
array([[ 2., 4., 8.],
[ 8., 5., 9.],
[ 0., 1., 3.],
[ 6., 1., 6.]])
>>> a.T.shape
(4, 3)
>>> a.shape
(3, 4)

ravel() 和 flatten() 都是將多維數(shù)組降位一維，flatten() 返回一份新的數(shù)組，且對(duì)它所做的修改不會(huì)影響原始數(shù)組，而 ravel() 返回的是 view，會(huì)影響原始矩陣。

在矩陣的轉(zhuǎn)置中，行和列的維度將交換，且矩陣中每一個(gè)元素將沿主對(duì)角線對(duì)稱變換。此外，reshape 如下所示返回修改過維度的新數(shù)組，而 resize 方法將直接修改原數(shù)組本身的維度。

>>> a
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])
>>> a.resize((2,6))
>>> a
array([[ 2., 8., 0., 6., 4., 5.],
[ 1., 1., 8., 9., 3., 6.]])

如果在 shape 變換中一個(gè)維度設(shè)為 - 1，那么這一個(gè)維度包含的元素?cái)?shù)將會(huì)被自動(dòng)計(jì)算。如下所示，a 一共有 12 個(gè)元素，在確定一共有 3 行后，-1 會(huì)自動(dòng)計(jì)算出應(yīng)該需要 4 列才能安排所有的元素。

>>> a.reshape(3,-1)
array([[ 2., 8., 0., 6.],
[ 4., 5., 1., 1.],
[ 8., 9., 3., 6.]])

數(shù)組堆疊

數(shù)組可以在不同軸上被堆疊在一起。如下所示 vstack 將在第二個(gè)維度（垂直）將兩個(gè)數(shù)組拼接在一起，而 hstack 將在第一個(gè)維度（水平）將數(shù)組拼接在一起。

>>> a = np.floor(10*np.random.random((2,2)))
>>> a
array([[ 8., 8.],
[ 0., 0.]])
>>> b = np.floor(10*np.random.random((2,2)))
>>> b
array([[ 1., 8.],
[ 0., 4.]])
>>> np.vstack((a,b))
array([[ 8., 8.],
[ 0., 0.],
[ 1., 8.],
[ 0., 4.]])
>>> np.hstack((a,b))
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])

column_stack 函數(shù)可堆疊一維數(shù)組為二維數(shù)組的列，作用相等于針對(duì)二維數(shù)組的 hstack 函數(shù)。

>>> from numpy import newaxis
>>> np.column_stack((a,b)) # with 2D arrays
array([[ 8., 8., 1., 8.],
[ 0., 0., 0., 4.]])
>>> a = np.array([4.,2.])
>>> b = np.array([3.,8.])
>>> np.column_stack((a,b)) # returns a 2D array
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a,b)) # the result is different
array([ 4., 2., 3., 8.])
>>> a[:,newaxis] # this allows to have a 2D columns vector
array([[ 4.],
[ 2.]])
>>> np.column_stack((a[:,newaxis],b[:,newaxis]))
array([[ 4., 3.],
[ 2., 8.]])
>>> np.hstack((a[:,newaxis],b[:,newaxis])) # the result is the same
array([[ 4., 3.],
[ 2., 8.]])

與 column_stack 相似，row_stack 函數(shù)相等于二維數(shù)組中的 vstack。一般在高于二維的情況中，hstack 沿第二個(gè)維度堆疊、vstack 沿第一個(gè)維度堆疊，而 concatenate 更進(jìn)一步可以在任意給定的維度上堆疊兩個(gè)數(shù)組，當(dāng)然這要求其它維度的長度都相等。concatenate 在很多深度模型中都有應(yīng)用，例如權(quán)重矩陣的堆疊或 DenseNet 特征圖的堆疊。

在復(fù)雜情況中，r_ 和 c_ 可以有效地在創(chuàng)建數(shù)組時(shí)幫助沿著一條軸堆疊數(shù)值，它們同樣允許使用范圍迭代「:」生成數(shù)組。

>>> np.r_[1:4,0,4]
array([1, 2, 3, 0, 4])

當(dāng)用數(shù)組為參數(shù)時(shí)，r_ 和 c_ 在默認(rèn)行為下與 vstack 和 hstack 相似，但它們?nèi)?concatenate 一樣允許給定需要堆疊的維度。

拆分?jǐn)?shù)組

使用 hsplit 可以順著水平軸拆分一個(gè)數(shù)組，我們指定切分后輸出的數(shù)組數(shù)，或指定在哪一列拆分?jǐn)?shù)組：

>>> a = np.floor(10*np.random.random((2,12)))
>>> a
array([[ 9., 5., 6., 3., 6., 8., 0., 7., 9., 7., 2., 7.],
[ 1., 4., 9., 2., 2., 1., 0., 6., 2., 2., 4., 0.]])
>>> np.hsplit(a,3) # Split a into 3
[array([[ 9., 5., 6., 3.],
[ 1., 4., 9., 2.]]), array([[ 6., 8., 0., 7.],
[ 2., 1., 0., 6.]]), array([[ 9., 7., 2., 7.],
[ 2., 2., 4., 0.]])]
>>> np.hsplit(a,(3,4)) # Split a after the third and the fourth column
[array([[ 9., 5., 6.],
[ 1., 4., 9.]]), array([[ 3.],
[ 2.]]), array([[ 6., 8., 0., 7., 9., 7., 2., 7.],
[ 2., 1., 0., 6., 2., 2., 4., 0.]])]

vsplit 沿著垂直軸拆分，array_split 可指定順著哪一條軸拆分。

復(fù)制與 views

在進(jìn)行數(shù)組運(yùn)算或操作時(shí)，入門者經(jīng)常很難判斷數(shù)據(jù)到底是復(fù)制到了新的數(shù)組還是直接在原始數(shù)據(jù)上修改。這對(duì)進(jìn)一步的運(yùn)算有很大的影響，因此有時(shí)候我們也需要復(fù)制內(nèi)容到新的變量?jī)?nèi)存中，而不能僅將新變量指向原內(nèi)存。目前一般有三種復(fù)制方法，即不復(fù)制內(nèi)存、淺復(fù)制以及深復(fù)制。

實(shí)際不復(fù)制

簡(jiǎn)單的任務(wù)并不會(huì)復(fù)制數(shù)組目標(biāo)或它們的數(shù)據(jù)，如下先把變量 a 賦值于 b，然后修改變量 b 就會(huì)同時(shí)修改變量 a，這種一般的賦值方法會(huì)令變量間具有關(guān)聯(lián)性。

>>> a = np.arange(12)
>>> b = a # no new object is created
>>> b is a # a and b are two names for the same ndarray object
True
>>> b.shape = 3,4 # changes the shape of a
>>> a.shape
(3, 4)

Pythan 將不定對(duì)象作為參照（references）傳遞，所以調(diào)用函數(shù)不會(huì)產(chǎn)生目標(biāo)識(shí)別符的變化，也不會(huì)發(fā)生實(shí)際的內(nèi)容復(fù)制。

>>> def f(x):
... print(id(x))
...
>>> id(a) # id is a unique identifier of an object
148293216
>>> f(a)
148293216

View 或淺復(fù)制

不同數(shù)組對(duì)象可以共享相同數(shù)據(jù)，view 方法可以創(chuàng)建一個(gè)新數(shù)組對(duì)象來查看相同數(shù)據(jù)。如下 c 和 a 的目標(biāo)識(shí)別符并不一致，且改變其中一個(gè)變量的 shape 并不會(huì)對(duì)應(yīng)改變另一個(gè)。但這兩個(gè)數(shù)組是共享所有元素的，所以改變一個(gè)數(shù)組的某個(gè)元素同樣會(huì)改變另一個(gè)數(shù)組的對(duì)應(yīng)元素。

>>> c = a.view()
>>> c is a
False
>>> c.base is a # c is a view of the data owned by a
True
>>> c.flags.owndata
False
>>>
>>> c.shape = 2,6 # a's shape doesn't change
>>> a.shape
(3, 4)
>>> c[0,4] = 1234 # a's data changes
>>> a
array([[ 0, 1, 2, 3],
[1234, 5, 6, 7],
[ 8, 9, 10, 11]])

分割數(shù)組輸出的是它的一個(gè) view，如下將數(shù)組 a 分割為子數(shù)組 s，那么 s 就是 a 的一個(gè) view，修改 s 中的元素同樣會(huì)修改 a 中對(duì)應(yīng)的元素。

>>> s = a[ : , 1:3] # spaces added for clarity; could also be written "s = a[:,1:3]"
>>> s[:] = 10 # s[:] is a view of s. Note the difference between s=10 and s[:]=10
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

深復(fù)制

copy 方法可完整地復(fù)制數(shù)組及數(shù)據(jù)，這種賦值方法會(huì)令兩個(gè)變量有不一樣的數(shù)組目標(biāo)，且數(shù)據(jù)不共享。

>>> d = a.copy() # a new array object with new data is created
>>> d is a
False
>>> d.base is a # d doesn't share anything with a
False
>>> d[0,0] = 9999
>>> a
array([[ 0, 10, 10, 3],
[1234, 10, 10, 7],
[ 8, 10, 10, 11]])

深入理解 NumPy

廣播機(jī)制

廣播操作是 NumPy 非常重要的一個(gè)特點(diǎn)，它允許 NumPy 擴(kuò)展矩陣間的運(yùn)算。例如它會(huì)隱式地把一個(gè)數(shù)組的異常維度調(diào)整到與另一個(gè)算子相匹配的維度以實(shí)現(xiàn)維度兼容。例如將一個(gè)維度為 [3,2] 的矩陣與另一個(gè)維度為 [3,1] 的矩陣相加是合法的，NumPy 會(huì)自動(dòng)將第二個(gè)矩陣擴(kuò)展到等同的維度。

為了定義兩個(gè)形狀是否是可兼容的，NumPy 從最后開始往前逐個(gè)比較它們的維度大小。在這個(gè)過程中，如果兩者的對(duì)應(yīng)維度相同，或者其一（或者全是）等于 1，則繼續(xù)進(jìn)行比較，直到最前面的維度。若不滿足這兩個(gè)條件，程序就會(huì)報(bào)錯(cuò)。

如下展示了一個(gè)廣播操作：

>>>a = np.array([1.0,2.0,3.0,4.0, 5.0, 6.0]).reshape(3,2)
>>>b = np.array([3.0])
>>>a * b

array([[ 3., 6.],
[ 9., 12.],
[ 15., 18.]])

高級(jí)索引

NumPy 比一般的 Python 序列提供更多的索引方式。除了之前看到的用整數(shù)和截取的索引，數(shù)組可以由整數(shù)數(shù)組和布爾數(shù)組 indexed。

通過數(shù)組索引

如下我們可以根據(jù)數(shù)組 i 和 j 索引數(shù)組 a 中間的元素，其中輸出數(shù)組保持索引的 shape。

>>> a = np.arange(12)**2 # the first 12 square numbers
>>> i = np.array( [ 1,1,3,8,5 ] ) # an array of indices
>>> a[i] # the elements of a at the positions i
array([ 1, 1, 9, 64, 25])

>>> j = np.array( [ [ 3, 4], [ 9, 7 ] ] ) # a bidimensional array of indices
>>> a[j] # the same shape as j
array([[ 9, 16],
[81, 49]])

當(dāng)使用多維數(shù)組作為索引時(shí)，每一個(gè)維度就會(huì)索引一次原數(shù)組，并按索引的 shape 排列。下面的代碼展示了這種索引方式，palette 可以視為簡(jiǎn)單的調(diào)色板，而數(shù)組 image 中的元素則表示索引對(duì)應(yīng)顏色的像素點(diǎn)。

>>> palette = np.array( [ [0,0,0], # black
... [255,0,0], # red
... [0,255,0], # green
... [0,0,255], # blue
... [255,255,255] ] ) # white
>>> image = np.array( [ [ 0, 1, 2, 0 ], # each value corresponds to a color in the palette
... [ 0, 3, 4, 0 ] ] )
>>> palette[image] # the (2,4,3) color image
array([[[ 0, 0, 0],
[255, 0, 0],
[ 0, 255, 0],
[ 0, 0, 0]],
[[ 0, 0, 0],
[ 0, 0, 255],
[255, 255, 255],
[ 0, 0, 0]]])
[81, 49]])

我們也可以使用多維索引獲取數(shù)組中的元素，多維索引的每個(gè)維度都必須有相同的形狀。如下多維數(shù)組 i 和 j 可以分別作為索引 a 中第一個(gè)維度和第二個(gè)維度的參數(shù)，例如 a[i, j] 分別從 i 和 j 中抽取一個(gè)元素作為索引 a 中元素的參數(shù)。

>>> a = np.arange(12).reshape(3,4)
>>> a
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>> i = np.array( [ [0,1], # indices for the first dim of a
... [1,2] ] )
>>> j = np.array( [ [2,1], # indices for the second dim
... [3,3] ] )
>>>
>>> a[i,j] # i and j must have equal shape
array([[ 2, 5],
[ 7, 11]])
>>>
>>> a[i,2]
array([[ 2, 6],
[ 6, 10]])
>>>
>>> a[:,j] # i.e., a[ : , j]
array([[[ 2, 1],
[ 3, 3]],
[[ 6, 5],
[ 7, 7]],
[[10, 9],
[11, 11]]])

同樣，我們把 i 和 j 放在一個(gè)序列中，然后用它作為索引：

>>> l = [i,j]
>>> a[l] # equivalent to a[i,j]
array([[ 2, 5],
[ 7, 11]])

然而，我們不能如上把 i 和 j 放在一個(gè)數(shù)組中作為索引，因?yàn)閿?shù)組會(huì)被理解為索引 a 的第一維度。

>>> s = np.array( [i,j] )
>>> a[s] # not what we want
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: index (3) out of range (0<=index<=2) in dimension 0
>>>
>>> a[tuple(s)] # same as a[i,j]
array([[ 2, 5],
[ 7, 11]])

另一個(gè)將數(shù)組作為索引的常用方法是搜索時(shí)間序列的最大值：

>>> time = np.linspace(20, 145, 5) # time scale
>>> data = np.sin(np.arange(20)).reshape(5,4) # 4 time-dependent series
>>> time
array([ 20. , 51.25, 82.5 , 113.75, 145. ])
>>> data
array([[ 0. , 0.84147098, 0.90929743, 0.14112001],
[-0.7568025 , -0.95892427, -0.2794155 , 0.6569866 ],
[ 0.98935825, 0.41211849, -0.54402111, -0.99999021],
[-0.53657292, 0.42016704, 0.99060736, 0.65028784],
[-0.28790332, -0.96139749, -0.75098725, 0.14987721]])
>>>
>>> ind = data.argmax(axis=0) # index of the maxima for each series
>>> ind
array([2, 0, 3, 1])
>>>
>>> time_max = time[ind] # times corresponding to the maxima
>>>
>>> data_max = data[ind, range(data.shape[1])] # => data[ind[0],0], data[ind[1],1]...
>>>
>>> time_max
array([ 82.5 , 20. , 113.75, 51.25])
>>> data_max
array([ 0.98935825, 0.84147098, 0.99060736, 0.6569866 ])
>>>
>>> np.all(data_max == data.max(axis=0))
True

你也可以用數(shù)組索引作為一個(gè)分配目標(biāo)：

>>> a = np.arange(5)
>>> a
array([0, 1, 2, 3, 4])
>>> a[[1,3,4]] = 0
>>> a
array([0, 0, 2, 0, 0])

然而，當(dāng)索引列表中有重復(fù)時(shí)，賦值任務(wù)會(huì)執(zhí)行多次，并保留最后一次結(jié)果。

>>> a = np.arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])

這是合理的，但注意如果你使用 Python 的 += 創(chuàng)建，可能不會(huì)得出預(yù)期的結(jié)果：

>>> a = np.arange(5)
>>> a[[0,0,2]]+=1
>>> a
array([1, 1, 3, 3, 4])

雖然 0 在索引列表中出現(xiàn)兩次，第 0 個(gè)元素只會(huì)增加一次。這是因?yàn)?Python 中「a+=1」等于「a = a + 1」.

用布爾數(shù)組做索引

當(dāng)我們索引數(shù)組元素時(shí)，我們?cè)谔峁┧饕斜�。但布爾值索引是不同的，我們需要清楚地選擇被索引數(shù)組中哪個(gè)元素是我們想要的哪個(gè)是不想要的。

布爾索引需要用和原數(shù)組相同 shape 的布爾值數(shù)組，如下只有在大于 4 的情況下才輸出 True，而得出來的布爾值數(shù)組可作為索引。

>>> a = np.arange(12).reshape(3,4)
>>> b = a > 4
>>> b # b is a boolean with a's shape
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]])
>>> a[b] # 1d array with the selected elements
array([ 5, 6, 7, 8, 9, 10, 11])

這個(gè)性質(zhì)在任務(wù)中非常有用，例如在 ReLu 激活函數(shù)中，只有大于 0 才輸出激活值，因此我們就能使用這種方式實(shí)現(xiàn) ReLU 激活函數(shù)。

>>> a[b] = 0 # All elements of 'a' higher than 4 become 0
>>> a
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])

第二種使用布爾索引的方法與整數(shù)索引更加相似的；在數(shù)組的每個(gè)維度中，我們使用一維布爾數(shù)組選擇我們想要的截取部分：

>>> a = np.arange(12).reshape(3,4)
>>> b1 = np.array([False,True,True]) # first dim selection
>>> b2 = np.array([True,False,True,False]) # second dim selection
>>>
>>> a[b1,:] # selecting rows
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[b1] # same thing
array([[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
>>>
>>> a[:,b2] # selecting columns
array([[ 0, 2],
[ 4, 6],
[ 8, 10]])
>>>
>>> a[b1,b2] # a weird thing to do
array([ 4, 10])

注意一維布爾數(shù)組的長度必須和想截取軸的長度相同。在上面的例子中，b1 的長度 3、b2 的長度為 4，它們分別對(duì)應(yīng)于 a 的第一個(gè)維度與第二個(gè)維度。

線性代數(shù)

簡(jiǎn)單的數(shù)組運(yùn)算

如下僅展示了簡(jiǎn)單的矩陣運(yùn)算更多詳細(xì)的方法可在實(shí)踐中遇到在查找 API。如下展示了矩陣的轉(zhuǎn)置、求逆、單位矩陣、矩陣乘法、矩陣的跡、解線性方程和求特征向量等基本運(yùn)算：

>>> import numpy as np
>>> a = np.array([[1.0, 2.0], [3.0, 4.0]])
>>> print(a)
[[ 1. 2.]
[ 3. 4.]]

>>> a.transpose()
array([[ 1., 3.],
[ 2., 4.]])

>>> np.linalg.inv(a)
array([[-2. , 1. ],
[ 1.5, -0.5]])

>>> u = np.eye(2) # unit 2x2 matrix; "eye" represents "I"
>>> u
array([[ 1., 0.],
[ 0., 1.]])
>>> j = np.array([[0.0, -1.0], [1.0, 0.0]])

>>> np.dot (j, j) # matrix product
array([[-1., 0.],
[ 0., -1.]])

>>> np.trace(u) # trace
2.0

>>> y = np.array([[5.], [7.]])
>>> np.linalg.solve(a, y)
array([[-3.],
[ 4.]])

>>> np.linalg.eig(j)
(array([ 0.+1.j, 0.-1.j]), array([[ 0.70710678+0.j , 0.70710678-0.j ],
[ 0.00000000-0.70710678j, 0.00000000+0.70710678j]]))

Parameters:
square matrix
Returns
The eigenvalues, each repeated according to its multiplicity.
The normalized (unit "length") eigenvectors, such that the
column ``v[:,i]`` is the eigenvector corresponding to the
eigenvalue ``w[i]`` .

數(shù)據(jù)科學(xué)初學(xué)者必知的 NumPy 基礎(chǔ)知識(shí)

原文檔鏈接：https://docs.scipy.org/doc/numpy/user/quickstart.html

標(biāo)簽：代碼搜索

版權(quán)申明：本站文章部分自網(wǎng)絡(luò)，如有侵權(quán)，請(qǐng)聯(lián)系：west999com@outlook.com
特別注意：本站所有轉(zhuǎn)載文章言論不代表本站觀點(diǎn)！
本站所提供的圖片等素材，版權(quán)歸原作者所有，如需使用，請(qǐng)與原作者聯(lián)系。

上一篇:GDPR到底是如何影響機(jī)器學(xué)習(xí)的？

下一篇:GDPR阻礙安全研究的五個(gè)方面

相關(guān)文章

最新資訊

熱門推薦

為學(xué)習(xí)和知識(shí)分享目的，本站文章部分自網(wǎng)絡(luò)，本站文章部分自網(wǎng)絡(luò)，如有侵權(quán)，請(qǐng)聯(lián)系：2653426586@qq.com QQ：2653426586

如有其他需求，請(qǐng)聯(lián)系：2653426586@qq.com QQ：2653426586

友情鏈接：網(wǎng)絡(luò)安全運(yùn)維經(jīng)驗(yàn) IT技術(shù)分享運(yùn)維隨筆錄鮮花東郊到家往約到家

中文字幕在线观看,亚洲а∨天堂久久精品9966,亚洲成a人片在线观看你懂的,亚洲av成人片无码网站,亚洲国产精品无码久久久五月天

搭建模型第一步：這里有所有你需要預(yù)習(xí)的NumPy基礎(chǔ)