Numpy基础

Posted 大师之路

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Numpy基础相关的知识,希望对你有一定的参考价值。

2 NumPy数组基础

2.1 Numpy数组对象

Numpy中的ndarray是一个多维数组对象, 该对象由两部分组成:

  • 实际的数据
  • 描述这些数据的元数据

大部分的数组操作仅修改元数据部分, 而不改变底层的实际数据.

Numpy数组一般是同质的.

与Python中一样, Numpy数组的下标也是从0开始的.

我们用arange函数创建一维数组, 并获取其数据类型:

In [1]: a = np.arange(5)

In [2]: a.dtype
Out[2]: dtype(int32)

In [16]: a
Out[16]: array([0, 1, 2, 3, 4])

In [17]: a.shape
Out[17]: (5,)

2.2 多维数组

In [18]: m = np.array([np.arange(2), np.arange(2)])

In [19]: m
Out[19]:
array([[0, 1],
       [0, 1]])

In [20]: m.shape
Out[20]: (2, 2)

2.2.1 选取数组元素

首先, 创建一个2x2的多维数组

In [21]: a = np.array([[1, 2], [3, 4]])

In [22]: a
Out[22]:
array([[1, 2],
       [3, 4]])

依次取数为:

In [23]: a[0, 0]
Out[23]: 1

In [24]: a[0, 1]
Out[24]: 2

In [25]: a[1, 0]
Out[25]: 3

In [26]: a[1, 1]
Out[26]: 4

2.2.2 numpy数据类型

 bool, inti, int8, int16, int32, int64, uint8, uint16, uint32, uint64, float16, float32, float64或float, complex64, complex128或complex

In [28]: np.float64(42)
Out[28]: 42.0

In [29]: np.int8(42.0)
Out[29]: 42

In [30]: np.bool(42)
Out[30]: True

In [31]: np.bool(0)
Out[31]: False

In [32]: np.bool(42.0)
Out[32]: True

In [33]: np.float(True)
Out[33]: 1.0

In [34]: np.float(False)
Out[34]: 0.0

在NumPy中, 许多函数的参数中可以指定数据类型

In [35]: np.arange(7, dtype=np.uint16)
Out[35]: array([0, 1, 2, 3, 4, 5, 6], dtype=uint16)

In [36]: np.arange(7, dtype=np.float)
Out[36]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

In [37]: np.arange(7, dtype=np.float64)
Out[37]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

In [38]: np.arange(7, dtype=np.float32)
Out[38]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.], dtype=float32)

数据类型也可以通过字符编码来定义(不推荐使用)

In [40]: np.arange(7, dtype=f)
Out[40]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.], dtype=float32)

In [41]: np.arange(7, dtype=d)
Out[41]: array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.])

In [42]: np.arange(7, dtype=D)
Out[42]: array([ 0.+0.j,  1.+0.j,  2.+0.j,  3.+0.j,  4.+0.j,  5.+0.j,  6.+0.j])

In [43]: np.arange(7, dtype=i)
Out[43]: array([0, 1, 2, 3, 4, 5, 6], dtype=int32)

完整的Numpy数据类型列表可以在sctypeDict中找到

In [45]: np.sctypeDict
Out[45]:
{?: numpy.bool_,
 0: numpy.bool_,
 byte: numpy.int8,
 b: numpy.int8,
 1: numpy.int8,
 ubyte: numpy.uint8,
 B: numpy.uint8,
 2: numpy.uint8,
 short: numpy.int16,
 h: numpy.int16,
 3: numpy.int16,
 ushort: numpy.uint16,
 H: numpy.uint16,
 4: numpy.uint16,
 i: numpy.int32,
 5: numpy.int32,
 uint: numpy.uint32,
 I: numpy.uint32,
 6: numpy.uint32,
 intp: numpy.int64,
 p: numpy.int64,
 9: numpy.int64,
 uintp: numpy.uint64,
 P: numpy.uint64,
 10: numpy.uint64,
 long: numpy.int32,
 l: numpy.int32,
 7: numpy.int32,
 L: numpy.uint32,
 8: numpy.uint32,
 longlong: numpy.int64,
 q: numpy.int64,
 ulonglong: numpy.uint64,
 Q: numpy.uint64,
 half: numpy.float16,
 e: numpy.float16,
 23: numpy.float16,
 f: numpy.float32,
 11: numpy.float32,
 double: numpy.float64,
 d: numpy.float64,
 12: numpy.float64,
 longdouble: numpy.float64,
 g: numpy.float64,
 13: numpy.float64,
 cfloat: numpy.complex128,
 F: numpy.complex64,
 14: numpy.complex64,
 cdouble: numpy.complex128,
 D: numpy.complex128,
 15: numpy.complex128,
 clongdouble: numpy.complex128,
 G: numpy.complex128,
 16: numpy.complex128,
 O: numpy.object_,
 17: numpy.object_,
 S: numpy.bytes_,
 18: numpy.bytes_,
 unicode: numpy.str_,
 U: numpy.str_,
 19: numpy.str_,
 void: numpy.void,
 V: numpy.void,
 20: numpy.void,
 M: numpy.datetime64,
 21: numpy.datetime64,
 m: numpy.timedelta64,
 22: numpy.timedelta64,
 bool8: numpy.bool_,
 Bool: numpy.bool_,
 b1: numpy.bool_,
 float16: numpy.float16,
 Float16: numpy.float16,
 f2: numpy.float16,
 float32: numpy.float32,
 Float32: numpy.float32,
 f4: numpy.float32,
 float64: numpy.float64,
 Float64: numpy.float64,
 f8: numpy.float64,
 complex64: numpy.complex64,
 Complex32: numpy.complex64,
 c8: numpy.complex64,
 complex128: numpy.complex128,
 Complex64: numpy.complex128,
 c16: numpy.complex128,
 object0: numpy.object_,
 Object0: numpy.object_,
 bytes0: numpy.bytes_,
 Bytes0: numpy.bytes_,
 str0: numpy.str_,
 Str0: numpy.str_,
 void0: numpy.void,
 Void0: numpy.void,
 datetime64: numpy.datetime64,
 Datetime64: numpy.datetime64,
 M8: numpy.datetime64,
 timedelta64: numpy.timedelta64,
 Timedelta64: numpy.timedelta64,
 m8: numpy.timedelta64,
 int32: numpy.int32,
 uint32: numpy.uint32,
 Int32: numpy.int32,
 UInt32: numpy.uint32,
 i4: numpy.int32,
 u4: numpy.uint32,
 int64: numpy.int64,
 uint64: numpy.uint64,
 Int64: numpy.int64,
 UInt64: numpy.uint64,
 i8: numpy.int64,
 u8: numpy.uint64,
 int16: numpy.int16,
 uint16: numpy.uint16,
 Int16: numpy.int16,
 UInt16: numpy.uint16,
 i2: numpy.int16,
 u2: numpy.uint16,
 int8: numpy.int8,
 uint8: numpy.uint8,
 Int8: numpy.int8,
 UInt8: numpy.uint8,
 i1: numpy.int8,
 u1: numpy.uint8,
 complex_: numpy.complex128,
 int0: numpy.int64,
 uint0: numpy.uint64,
 single: numpy.float32,
 csingle: numpy.complex64,
 singlecomplex: numpy.complex64,
 float_: numpy.float64,
 intc: numpy.int32,
 uintc: numpy.uint32,
 int_: numpy.int32,
 longfloat: numpy.float64,
 clongfloat: numpy.complex128,
 longcomplex: numpy.complex128,
 bool_: numpy.bool_,
 unicode_: numpy.str_,
 object_: numpy.object_,
 bytes_: numpy.bytes_,
 str_: numpy.str_,
 string_: numpy.bytes_,
 int: numpy.int32,
 float: numpy.float64,
 complex: numpy.complex128,
 bool: numpy.bool_,
 object: numpy.object_,
 str: numpy.str_,
 bytes: numpy.bytes_,
 a: numpy.bytes_}

2.3 自定义数据类型

自定义数据类型是一种异构数据类型, 可以当做用来记录电子表格或数据库中一行数据的结构.

作为示例,我们将创建一个存储商店库存信息的数据类型。其中,我们用一个长度为40个字符的字符串来记录商品名称,用一个32位的整数来记录商品的库存数量,最后用一个32位的单精度浮点数来记录商品价格。下面是具体的步骤。

(1) 创建数据类型:

In [47]: t = np.dtype([(name, np.str_, 40), (numitems, np.int32), (price, np.float32)])

In [48]: t
Out[48]: dtype([(name, <U40), (numitems, <i4), (price, <f4)]) 

 (2) 查看数据类型(也可以查看某一字段的数据类型) :

In [49]: t[name]
Out[49]: dtype(<U40

(3) 创建指定类型的数组

In [50]: itemz = np.array([(Meaning of life DVD, 42, 3.14), (Butter, 13, 2.72)], dtype=t)

In [51]: itemz[1]
Out[51]: (Butter, 13,  2.72000003) 

2.4 一维数组的索引和切片

一维数组的切片操作与Python列表的切片操作很相似。

常规切片 

In [53]: a[3:7]
Out[53]: array([3, 4, 5, 6]) 

也可以用下标0~7,以2为步长选取元素:

In [54]: a[:7:2]
Out[54]: array([0, 2, 4, 6])

Python中一样,我们也可以利用负数下标翻转数组:

In [55]: a[::-1]
Out[55]: array([8, 7, 6, 5, 4, 3, 2, 1, 0])

2.5 多维数组的索引和切片

ndarray支持在多维数组上的切片操作。为了方便起见,我们可以用一个省略号(...)来表示遍历剩下的维度。

举例来说,

(1) 我们先用arange函数创建一个数组并改变其维度,使之变成一个三维数组:

In [62]: b = np.arange(24).reshape(2,3,4)

In [63]: b.shape
Out[63]: (2, 3, 4)

In [64]: b
Out[64]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

 (2) 下标取数

In [65]: b[0,0,0]
Out[65]: 0

In [66]: b[1,0,0]
Out[66]: 12

(3) 如果我们不关心楼层,也就是说要选取所有楼层的第1行、第1列的房间,那么可以将第1 个下标用英文标点的冒号:来代替:

In [68]: b[:,0,0]
Out[68]: array([ 0, 12])

选择第一层
In [69]: b[0]
Out[69]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

 

也可以这样写

In [70]: b[0, :, :]
Out[70]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

 

多个冒号可以用一个省略号(...)来代替,因此上面的代码等价于:

In [71]: b[0, ...]
Out[71]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

 

2.6 改变数组的维度

(1) ravel 我们可以用ravel函数完成展平的操作:

In [76]: b
Out[76]:
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

In [77]: b.ravel()
Out[77]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

 

(2) flatten 这个函数恰如其名, flatten就是展平的意思,与ravel函数的功能相同。不过,flatten函数会请求分配内存来保存结果,而ravel函数只是返回数组的一个视图(view):

In [78]: b.flatten()
Out[78]:
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])

(3) reshape 用元组设置维度 除了可以使用reshape函数,我们也可以直接用一个正整数元组来设置数组的维度,如下所示:

In [79]: b.shape = (6, 4)

In [80]: b
Out[80]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

 

(4) transpose 在线性代数中, 转置矩阵是很常见的操作。对于多维数组,我们也可以这样做

In [81]: b.transpose()
Out[81]:
array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

(5) resize resizereshape函数的功能一样,但resize会直接修改所操作的数组

In [82]: b.resize((2,12))

In [83]: b
Out[83]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])

 

2.7 数组的组合

(0) 创建数组

In [84]: a = np.arange(9).reshape(3,3)

In [85]: a
Out[85]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [86]: b = 2 * a

In [87]: b
Out[87]:
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

 

(1) hstack 水平组合

In [89]: np.hstack((a, b))
Out[89]:
array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16]])

我们也可以用concatenate函数来实现同样的效果,如下所示:

In [90]: np.concatenate((a, b), axis=1)
Out[90]:
array([[ 0,  1,  2,  0,  2,  4],
       [ 3,  4,  5,  6,  8, 10],
       [ 6,  7,  8, 12, 14, 16]])

 

(2) vstack 垂直组合

In [91]: np.vstack((a, b))
Out[91]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

 

同样,我们将concatenate函数的axis参数设置为0即可实现同样的效果。这也是axis参数的默认值

In [92]: np.concatenate((a, b), axis=0)
Out[92]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

 

(3) dstack 深度组合

In [93]: np.dstack((a, b))
Out[93]:
array([[[ 0,  0],
        [ 1,  2],
        [ 2,  4]],

       [[ 3,  6],
        [ 4,  8],
        [ 5, 10]],

       [[ 6, 12],
        [ 7, 14],
        [ 8, 16]]])

 

(4) column_stack 列组合

对于一维数组, column_stack函数对于一维数组将按列方向进行组合

In [96]: oned = np.arange(2)

In [97]: oned
Out[97]: array([0, 1])

In [98]: twice_oned = 2 * oned

In [99]: twice_oned
Out[99]: array([0, 2])

In [100]: np.column_stack((oned, twice_oned))
Out[100]:
array([[0, 0],
       [1, 2]])

 

而对于二维数组, column_stackhstack的效果是相同的

In [104]: np.column_stack((a, b)) == np.hstack((a, b))
Out[104]:
array([[ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True],
       [ True,  True,  True,  True,  True,  True]], dtype=bool)

 

(5) row_stack 行组合

与column_stack类似。对于两个一维数组,将直接层叠起来组合成一个二维数组。

In [106]: np.row_stack((oned, twice_oned))
Out[106]:
array([[0, 1],
       [0, 2]])

 

 对于二维数组, row_stackvstack的效果是相同的

In [107]: np.row_stack((a, b))
Out[107]:
array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

In [108]: np.row_stack((a, b)) == np.vstack((a, b))
Out[108]:
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)  

2.8 数组的分割

NumPy数组可以进行水平、垂直或深度分割,相关的函数有hsplitvsplitdsplitsplit。我们可以将数组分割成相同大小的子数组,也可以指定原数组中需要分割的位置。

(1) hsplit 水平分割

In [110]: np.hsplit(a, 3)
Out[110]:
[array([[0],
        [3],
        [6]]), array([[1],
        [4],
        [7]]), array([[2],
        [5],
        [8]])]

 

(2) vsplit 垂直分割

(3) dsplit 深度分割

分割对比

In [112]: c = np.arange(27).reshape(3, 3, 3)

In [113]: c
Out[113]:
array([[[ 0,  1,  2],
        [ 3,  4,  5],
        [ 6,  7,  8]],

       [[ 9, 10, 11],
        [12, 13, 14],
        [15, 16, 17]],

       [[18, 19, 20],
        [21, 22, 23],
        [24, 25, 26]]])

In [114]: np.dsplit(c, 3)
Out[114]:
[array([[[ 0],
         [ 3],
         [ 6]],

        [[ 9],
         [12],
         [15]],

        [[18],
         [21],
         [24]]]), array([[[ 1],
         [ 4],
         [ 7]],

        [[10],
         [13],
         [16]],

        [[19],
         [22],
         [25]]]), array([[[ 2],
         [ 5],
         [ 8]],

        [[11],
         [14],
         [17]],

        [[20],
         [23],
         [26]]])]

In [115]: np.hsplit(c, 3)
Out[115]:
[array([[[ 0,  1,  2]],

        [[ 9, 10, 11]],

        [[18, 19, 20]]]), array([[[ 3,  4,  5]],

        [[12, 13, 14]],

        [[21, 22, 23]]]), array([[[ 6,  7,  8]],

        [[15, 16, 17]],

        [[24, 25, 26]]])]

In [116]: np.vsplit(c, 3)
Out[116]:
[array([[[0, 1, 2],
         [3, 4, 5],
         [6, 7, 8]]]), array([[[ 9, 10, 11],
         [12, 13, 14],
         [15, 16, 17]]]), array([[[18, 19, 20],
         [21, 22, 23],
         [24, 25, 26]]])]

 

In [121]: c = np.arange(9).reshape(3, 3)

In [122]: c
Out[122]:
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [123]: np.hsplit(c, 3)
Out[123]:
[array([[0],
        [3],
        [6]]), array([[1],
        [4],
        [7]]), array([[2],
        [5],
        [8]])]

In [124]: np.vsplit(c, 3)
Out[124]: [array([[0, 1, 2]]), array([[3, 4, 5]]), array([[6, 7, 8]])]

In [125]: np.dsplit(c, 3)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-125-aa2ba1054587> in <module>()
----> 1 np.dsplit(c, 3)

c:\python\python362\lib\site-packages\numpy\lib\shape_base.py in dsplit(ary, indices_or_sections)
    665     """
    666     if len(_nx.shape(ary)) < 3:
--> 667         raise ValueError(‘dsplit only works on arrays of 3 or more dimensions‘)
    668     return split(ary, indices_or_sections, 2)
    669

ValueError: dsplit only works on arrays of 3 or more dimensions

 

2.11 数组的属性

除了shapedtype属性以外, ndarray对象还有很多其他的属性,在下面一一列出。 

  • ndim 给出数组的维数,或数组轴的个数
  • size 给出数组元素的总个数
  • itemsize 给出数组中的元素在内存中所占的字节数
  • nbytes  整个数组所占的存储空间 = b.size * b.itemsize

 

In [127]: b = np.arange(24).reshape(2,12)

In [128]: b
Out[128]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])

In [129]: b.ndim
Out[129]: 2

In [130]: b.size
Out[130]: 24

In [131]: b.itemsize
Out[131]: 4

In [132]: b.nbytes
Out[132]: 96

 

  • T属性的效果和transpose函数一样,如下所示
In [133]: b.resize(6,4)

In [134]: b
Out[134]:
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

In [135]: b.T
Out[135]:
array([[ 0,  4,  8, 12, 16, 20],
       [ 1,  5,  9, 13, 17, 21],
       [ 2,  6, 10, 14, 18, 22],
       [ 3,  7, 11, 15, 19, 23]])

对于一维数组,其T属性就是原数组

  • flat属性将返回一个numpy.flatiter对象, 这是获得flatiter对象的唯一方式——我们无法访问flatiter的构造函数。这个所谓的“扁平迭代器”可以让我们像遍历一维数组一样去遍历任意的多维数组,如下所示
In [136]: b = np.arange(4).reshape(2,2)

In [137]: b
Out[137]:
array([[0, 1],
       [2, 3]])

In [138]: f = b.flat

In [139]: f
Out[139]: <numpy.flatiter at 0x2cc108e1280>

In [140]: for item in f: print(item)
0
1
2
3

我们还可以用flat对象直接获取一个数组元素:

In [141]: b.flat[2]
Out[141]: 2

In [142]: b.flat[3]
Out[142]: 3

 

或者获取多个元素

In [143]: b.flat[[1, 3]]
Out[143]: array([1, 3])

flat属性是一个可赋值的属性。对flat属性赋值将导致整个数组的元素都被覆盖

In [144]: b.flat = 7

In [145]: b
Out[145]:
array([[7, 7],
       [7, 7]])

In [146]: b.flat[[1, 3]] = 1

In [147]: b
Out[147]:
array([[7, 1],
       [7, 1]])

 

  • tolist Numpy数组转换成Python列表
In [148]: b.tolist()
Out[148]: [[7, 1], [7, 1]]

 

3 常用函数

3.1 txt文件读写

创建矩阵, 使用savetxt保存

In [149]: i2 = np.eye(2)

In [150]: i2
Out[150]:
array([[ 1.,  0.],
       [ 0.,  1.]])

In [151]: np.savetxt(d:/cache/eye.txt, i2)

 

以上是关于Numpy基础的主要内容,如果未能解决你的问题,请参考以下文章

numpy基础入门

numpy基础代码操练

Numpy学习:《Python数据分析基础教程NumPy学习指南第2版》中文PDF+英文PDF+代码

numpy基础入门

[vscode]--HTML代码片段(基础版,reactvuejquery)

python numpy 基础教程