Python机器学习入门——科学计算库（Numpy）

Posted 2021-09-01 零陵上将军_xdr

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Python机器学习入门——科学计算库（Numpy）相关的知识，希望对你有一定的参考价值。

Numpy优势

Numpy效率

Numpy（Numerical Python）是一个开源的Python科学计算库，用于快速处理任意维度的数组。

Numpy支持常见的数组和矩阵操作。对于同样的数值计算任务，使用Numpy比直接使用Python要简洁的多。

Numpy使用ndarray对象来处理多维数组，该对象是一个快速而灵活的大数据容器。

ndarray介绍

NumPy provides an N-dimensional array type, the ndarray, which
describes a collection of “items” of the same type.

NumPy提供了一个N维数组类型ndarray，它描述了相同类型的“items”的集合。

用ndarray进行存储：

import numpy as np

# 创建ndarray
s = np.array(
[[12, 89, 46, 67, 79],
[56, 97, 19, 67, 81],
[90, 84, 78, 67, 74],
[91, 91, 80, 67, 69],
[76, 87, 75, 67, 86],
[70, 79, 84, 67, 84],
[94, 92, 93, 67, 64],
[86, 85, 83, 67, 76]])

print(s)

输出结果：

ndarray与Python原生list运算效率对比

在这里我们通过一段代码运行来体会到ndarray的好处：

import random
import time
import numpy as np
a = []
for i in range(10000000):
    a.append(random.random())
t=time.time()
sum1=sum(a)
print("list时间：",time.time()-t)

b=np.array(a)

t=time.time()
sum2=np.sum(b)
print("ndarray时间",time.time()-t)

从中我们看到ndarray的计算速度要快很多，节约了时间。机器学习的最大特点就是大量的数据运算，那么如果没有一个快速的解决方案，那可能现在python也在机器学习领域达不到好的效果。

N维数组-ndarray

ndarray的属性

数组属性反映了数组本身固有的信息。

属性名字	属性解释
ndarray.shape	数组维度的元组
ndarray.dtype	数组元素的类型
ndarray.ndim	数组维数
ndarray.size	数组中的元素数量
ndarray.itemsize	一个数组元素的长度（字节）

import numpy as np

a=np.array([[1,2,3],[4,5,6]])
b=np.array([2,0,3,6,2,5,9])
c=np.array([[1,2,3],[4,5,6],[7,8,9]])

print(a.shape)
print(b.shape)
print(c.shape)
print('--------------------------')

print(a.ndim)
print(b.ndim)
print(c.ndim)
print('--------------------------')

print(a.size)
print(b.size)
print(c.size)
print('--------------------------')

print(a.itemsize)
print(b.itemsize)
print(c.itemsize)
print('--------------------------')

print(a.dtype)
print(b.dtype)
print(c.dtype)

ndarray的形状,

首先创建一些数组。

# 创建不同形状的数组,分别打印出形状
import numpy as np

a = np.array([[1,2,3],[4,5,6]])
b = np.array([1,2,3,4])
c = np.array([[[1,2,3],[4,5,6]],[[1,2,3],[4,5,6]]])

print(a.shape)
print(b.shape)
print(c.shape)

如何理解数组的形状？

二维数组：

三维数组：

ndarray的类型

dtype是numpy.dtype类型，先看看对于数组来说都有哪些类型

名称	描述简写
np.bool	用一个字节存储的布尔类型（True或False）	‘b’
np.int8	一个字节大小，-128 至 127	‘i’
np.int16	整数，-32768 至 32767	‘i2’
np.int32	整数，-2^31 至 2^32 -1	‘i4’
np.int64	整数，-2^63 至 2^63 - 1	‘i8’
np.uint8	无符号整数，0 至 255	‘u’
np.uint16	无符号整数，0 至 65535	‘u2’
np.uint32	无符号整数，0 至 2^32 - 1	‘u4’
np.uint64	无符号整数，0 至 2^64 - 1	‘u8’
np.float16	半精度浮点数：16位，正负号1位，指数5位，精度10位	‘f2’
np.float32	单精度浮点数：32位，正负号1位，指数8位，精度23位	‘f4’
np.float64	双精度浮点数：64位，正负号1位，指数11位，精度52位	‘f8’
np.complex64	复数，分别用两个32位浮点数表示实部和虚部	‘c8’
np.complex128	复数，分别用两个64位浮点数表示实部和虚部	‘c16’
np.object_	python对象	‘O’
np.string_	字符串	‘S’
np.unicode_	unicode类型	‘U’

基本操作

生成数组的方法

生成0和1的数组:

np.ones(shape, dtype)
np.ones_like(a, dtype)
np.zeros(shape, dtype)
np.zeros_like(a, dtype)

例：

import numpy as np

one=np.ones([2,2,7])
zero=np.zeros_like(one)
print(one)
print('-------------------')
print(zero)

生成数组的方法

生成0和1的数组

np.array(object, dtype)

np.asarray(a, dtype)

从现有数组生成



    np.array(object, dtype)

    np.asarray(a, dtype)

a = np.array([[1,2,3],[4,5,6]])
# 从现有的数组当中创建
a1 = np.array(a)
# 相当于索引的形式，并没有真正的创建一个新的
a2 = np.asarray(a)

生成固定范围的数组

1、创建等差数组 — 指定数量

np.linspace (start, stop, num, endpoint)

参数：

start:序列的起始值
stop:序列的终止值
num:要生成的等间隔样例数量，默认为50
endpoint:序列中是否包含stop值，默认为ture

import numpy as np

a=np.linspace(0, 100, 11)
print(a)

2、创建等差数组 — 指定步长

np.arange(start,stop, step, dtype)

参数

step:步长,默认值为1

import numpy as np

a=np.arange(-10,10,1,np.int32)
print(a)

3、创建等比数列

np.logspace(start,stop, num)

参数:

num:要生成的等比数列数量，默认为50

import numpy as np

a=np.logspace(0,3,4)
print(a)

生成随机数组

使用模块介绍

np.random模块

正态分布

什么是正态分布：

正态分布是一种概率分布。正态分布是具有两个参数μ和σ的连续型随机变量的分布，第一参数μ是服从正态分布的随机变量的均值，第二个参数σ是此随机变量的方差，所以正态分布记作N(μ，σ)。

正态分布特点：

μ决定了其位置，其标准差σ决定了分布的幅度。当μ = 0,σ = 1时的正态分布是标准正态分布。

正态分布创建方式：

1、np.random.randn(d0, d1, …, dn)

功能：从标准正态分布中返回一个或多个样本值

2、np.random.normal(loc=0.0, scale=1.0,
size=None)

loc：float

此概率分布的均值（对应着整个分布的中心centre）

scale：float

此概率分布的标准差（对应于分布的宽度，scale越大越矮胖，scale越小，越瘦高）

size：int or tuple of ints

输出的shape，默认为None，只输出一个值

3、np.random.standard_normal(size=None)

返回指定形状的标准正态分布的数组。

import numpy as np
import matplotlib.pyplot as plt

x=np.random.uniform(-1,1,10000000)

plt.figure(figsize=(20,8),dpi=100)

plt.hist(x,bins=10000)

plt.show()

均匀分布

np.random.rand(d0, d1, …, dn)
返回[0.0，1.0)内的一组均匀分布的数。

np.random.uniform(low=0.0, high=1.0, size=None)
功能：从一个均匀分布[low,high)中随机采样，注意定义域是左闭右开，即包含low，不包含high.
返回值：ndarray类型，其形状和参数size中描述一致。
参数介绍:
low: 采样下界，float类型，默认值为0；
high: 采样上界，float类型，默认值为1；
size: 输出样本数目，为int或元组(tuple)类型，例如，size=(m,n,k), 则输出mnk个样本，缺省时输出1个值。

np.random.randint(low, high=None, size=None, dtype=‘l’)

从一个均匀分布中随机采样，生成一个整数或N维整数数组，
取数范围：若high不为None时，取[low,high)之间随机整数，否则取值[0,low)之间随机整数。

import numpy as np
import matplotlib.pyplot as plt

x=np.random.uniform(0,10,10000000)

plt.figure(figsize=(20,8),dpi=100)

plt.hist(x,1000)

plt.show()

数组的索引、切片

一维、二维、三维的数组如何索引？

直接进行索引,切片
对象[:, :] – 先行后列

import numpy as np
a1 = np.array([ [[1,2,3],[4,5,6]], [[12,3,34],[5,6,7]]])

print(a1)
print('---------------')
print(a1[0, 0, 1])

形状修改

1、ndarray.reshape(shape, order)

返回修改了类型之后的数组

2、ndarray.tostring([order])或者ndarray.tobytes([order])

构造包含数组中原始数据字节的Python字节
注意：tostring方法在最新python版本中已经过时。

数组的去重

np.unique()

import numpy as np

a=np.array([[1,2,3,4,3,21,0],[9,2,1,4,5,7,3]])

a=np.unique(a)
print(a)

注意：去重的结果是排序后的

ndarray运算

逻辑运算

# 生成10名同学，5门功课的数据
>>> score = np.random.randint(40, 100, (10, 5))

# 取出最后4名同学的成绩，用于逻辑判断
>>> test_score = score[6:, 0:5]

# 逻辑判断, 如果成绩大于60就标记为True 否则为False
>>> test_score > 60
array([[ True,  True,  True, False,  True],
       [ True,  True,  True, False,  True],
       [ True,  True, False, False,  True],
       [False,  True,  True,  True,  True]])

# BOOL赋值, 将满足条件的设置为指定的值-布尔索引
>>> test_score[test_score > 60] = 1
>>> test_score
array([[ 1,  1,  1, 52,  1],
       [ 1,  1,  1, 59,  1],
       [ 1,  1, 44, 44,  1],
       [59,  1,  1,  1,  1]])

通用判断函数

np.all()

# 判断前两名同学的成绩[0:2, :]是否全及格
np.all(score[0:2, :] > 60)

np.any()

# 判断前两名同学的成绩[0:2, :]是否有大于90分的
np.any(score[0:2, :] > 80)

np.where（三元运算符）

通过使用np.where能够进行更加复杂的运算

np.where()

#判断前四名学生,前四门课程中，成绩中大于60的置为1，否则为0 temp = score[:4, :4] np.where(temp > 60, 1, 0)

复合逻辑需要结合np.logical_and和np.logical_or使用：

# 判断前四名学生,前四门课程中，成绩中大于60且小于90的换为1，否则为0
np.where(np.logical_and(temp > 60, temp < 90), 1, 0)

# 判断前四名学生,前四门课程中，成绩中大于90或小于60的换为1，否则为0
np.where(np.logical_or(temp > 90, temp < 60), 1, 0)

统计运算

在数据挖掘/机器学习领域，统计指标的值也是我们分析问题的一种方式。常用的指标如下：

min(a, axis) 最小值
max(a, axis]) 最大值
median(a, axis) 中位数
mean(a, axis, dtype) 算术平均数
std(a, axis, dtype) 标准差
var(a, axis,dtype) 方差
np.argmax(axis=) 最大元素对应的下标
np.argmin(axis=) 最小元素对应的下标

数组间运算

数组与数的运算

import numpy as np

arr = np.array([[1, 2, 3, 2, 1, 4], [5, 6, 1, 2, 3, 1]])
print(arr + 1)
print('----------------------------')
print(arr / 2)
print('----------------------------')
# 可以对比python列表的运算，看出区别
a = [1, 2, 3, 4, 5]
print(a * 3)

数组与数组的运算

广播机制：数组在进行矢量化运算时，要求数组的形状是相等的。当形状不相等的数组执行算术运算的时候，就会出现广播机制，该机制会对数组进行扩展，使数组的shape属性值一样，这样，就可以进行矢量化运算了。下面通过一个例子进行说明：

import numpy as np

arr1 = np.array([[0],[1],[2],[3]])
print(arr1.shape)
# (4, 1)

arr2 = np.array([1,2,3])
arr2.shape
# (3,)

print(arr1+arr2)

上述代码中，数组arr1是4行1列，arr2是1行3列。这两个数组要进行相加，按照广播机制会对数组arr1和arr2都进行扩展，使得数组arr1和arr2都变成4行3列。

下面通过一张图来描述广播机制扩展数组的过程：
广播机制实现了时两个或两个以上数组的运算，即使这些数组的shape不是完全相同的，只需要满足如下任意一个条件即可。

1.数组的某一维度等长。
2.其中一个数组的某一维度为1 。

广播机制需要扩展维度小的数组，使得它与维度最大的数组的shape值相同，以便使用元素级函数或者运算符进行运算。

以上是关于Python机器学习入门——科学计算库（Numpy）的主要内容，如果未能解决你的问题，请参考以下文章

Python机器学习入门——科学计算库（Numpy）

目录

Numpy优势

Numpy效率

ndarray介绍

ndarray与Python原生list运算效率对比

N维数组-ndarray

ndarray的属性

ndarray的形状,

ndarray的类型

基本操作

生成数组的方法

生成0和1的数组:

生成数组的方法

从现有数组生成

生成固定范围的数组

生成随机数组

使用模块介绍

正态分布

均匀分布

数组的索引、切片

形状修改

数组的去重

ndarray运算

逻辑运算

通用判断函数

np.where（三元运算符）

统计运算

数组间运算

数组与数的运算

数组与数组的运算