python学习之深入

Posted 2020-07-26

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了python学习之深入相关的知识，希望对你有一定的参考价值。

一、迭代器和生成器

1、迭代器

迭代器是访问集合元素的一种方式。迭代器对象从集合的第一个元素开始访问，直到所有的元素被访问完结束。迭代器只能往前不会后退，不过这也没什么，因为人们很少在迭代途中往后退。另外，迭代器的一大优点是不要求事先准备好整个迭代过程中所有的元素。迭代器仅仅在迭代到某个元素时才计算该元素，而在这之前或之后，元素可以不存在或者被销毁。这个特点使得它特别适合用于遍历一些巨大的或是无限的集合，比如几个G的文件

特点：

访问者不需要关心迭代器内部的结构，仅需通过next()方法不断去取下一个内容
不能随机访问集合中的某个值，只能从头到尾依次访问
访问到一半时不能往回退
便于循环比较大的数据集合，节省内存

>>> a = iter([1,2,3,4,5])
>>> a
<list_iterator object at 0x101402630>
>>> a.__next__()
1
>>> a.__next__()
2
>>> a.__next__()
3
>>> a.__next__()
4
>>> a.__next__()
5
>>> a.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
2、生成器

一个函数调用时返回一个迭代器，那这个函数就叫做生成器（generator）；如果函数中包含yield语法，那这个函数就会变成生成器；

def func():
yield 1
yield 2
yield 3
yield 4
上述代码中：func是函数称为生成器，当执行此函数func()时会得到一个迭代器。

>>> temp = func()
>>> temp.__next__()
1
>>> temp.__next__()

2
>>> temp.__next__()
3
>>> temp.__next__()
4
>>> temp.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
3、实例

a、利用生成器自定义range

def nrange(num):
temp = -1
while True:
temp = temp + 1
if temp >= num:
return
else:
yield temp
b、利用迭代器访问range

二、模块

模块，用一砣代码实现了某个功能的代码集合。

类似于函数式编程和面向过程编程，函数式编程则完成一个功能，其他代码用来调用即可，提供了代码的重用性和代码间的耦合。而对于一个复杂的功能来，可能需要多个函数才能完成（函数又可以在不同的.py文件中），n个 .py 文件组成的代码集合就称为模块。

如：os 是系统相关的模块；file是文件操作相关的模块

模块分为三种：

自定义模块
内置标准模块（又称标准库）
开源模块

２.１　time & datetime模块

print(time.clock()) #返回处理器时间,3.3开始已废弃
print(time.process_time()) #返回处理器时间,3.3开始已废弃
print(time.time()) #返回当前系统时间戳
print(time.ctime()) #输出Tue Jan 26 18:23:48 2016 ,当前系统时间
print(time.ctime(time.time()-86640)) #将时间戳转为字符串格式
print(time.gmtime(time.time()-86640)) #将时间戳转换成struct_time格式
print(time.localtime(time.time()-86640)) #将时间戳转换成struct_time格式,但返回的本地时间
print(time.mktime(time.localtime())) #与time.localtime()功能相反,将struct_time格式转回成时间戳格式
#time.sleep(4) #sleep
print(time.strftime("%Y-%m-%d %H:%M:%S",time.gmtime()) ) #将struct_time格式转成指定的字符串格式
print(time.strptime("2016-01-28","%Y-%m-%d") ) #将字符串格式转换成struct_time格式

#datetime module

print(datetime.date.today()) #输出格式 2016-01-26
print(datetime.date.fromtimestamp(time.time()-864400) ) #2016-01-16 将时间戳转成日期格式
current_time = datetime.datetime.now() #
print(current_time) #输出2016-01-26 19:04:30.335935
print(current_time.timetuple()) #返回struct_time格式

#datetime.replace([year[, month[, day[, hour[, minute[, second[, microsecond[, tzinfo]]]]]]]])
print(current_time.replace(2014,9,12)) #输出2014-09-12 19:06:24.074900,返回当前时间,但指定的值将被替换

str_to_date = datetime.datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M") #将字符串转换成日期格式
new_date = datetime.datetime.now() + datetime.timedelta(days=10) #比现在加10天
new_date = datetime.datetime.now() + datetime.timedelta(days=-10) #比现在减10天
new_date = datetime.datetime.now() + datetime.timedelta(hours=-10) #比现在减10小时
new_date = datetime.datetime.now() + datetime.timedelta(seconds=120) #比现在+120s
print(new_date)

2.2 随机数

mport random
print random.random()
print random.randint(1,2)
print random.randrange(1,10)
生成随机验证码

import random
checkcode = ‘‘
for i in range(4):
current = random.randrange(0,4)
if current != i:
temp = chr(random.randint(65,90))
else:
temp = random.randint(0,9)
checkcode += str(temp)
print checkcode

2.3 OS模块　　
提供对操作系统进行调用的接口

os.getcwd() 获取当前工作目录，即当前python脚本工作的目录路径
os.chdir("dirname") 改变当前脚本工作目录；相当于shell下cd
os.curdir 返回当前目录: (‘.‘)
os.pardir 获取当前目录的父目录字符串名：(‘..‘)
os.makedirs(‘dirname1/dirname2‘) 可生成多层递归目录
os.removedirs(‘dirname1‘) 若目录为空，则删除，并递归到上一级目录，如若也为空，则删除，依此类推
os.mkdir(‘dirname‘) 生成单级目录；相当于shell中mkdir dirname
os.rmdir(‘dirname‘) 删除单级空目录，若目录不为空则无法删除，报错；相当于shell中rmdir dirname
os.listdir(‘dirname‘) 列出指定目录下的所有文件和子目录，包括隐藏文件，并以列表方式打印
os.remove() 删除一个文件
os.rename("oldname","newname") 重命名文件/目录
os.stat(‘path/filename‘) 获取文件/目录信息
os.sep 输出操作系统特定的路径分隔符，win下为"\\",Linux下为"/"
os.linesep 输出当前平台使用的行终止符，win下为"\t\n",Linux下为"\n"
os.pathsep 输出用于分割文件路径的字符串
os.name 输出字符串指示当前使用平台。win->‘nt‘; Linux->‘posix‘
os.system("bash command") 运行shell命令，直接显示
os.environ 获取系统环境变量
os.path.abspath(path) 返回path规范化的绝对路径
os.path.split(path) 将path分割成目录和文件名二元组返回
os.path.dirname(path) 返回path的目录。其实就是os.path.split(path)的第一个元素
os.path.basename(path) 返回path最后的文件名。如何path以／或\结尾，那么就会返回空值。即os.path.split(path)的第二个元素
os.path.exists(path) 如果path存在，返回True；如果path不存在，返回False
os.path.isabs(path) 如果path是绝对路径，返回True
os.path.isfile(path) 如果path是一个存在的文件，返回True。否则返回False
os.path.isdir(path) 如果path是一个存在的目录，则返回True。否则返回False
os.path.join(path1[, path2[, ...]]) 将多个路径组合后返回，第一个绝对路径之前的参数将被忽略
os.path.getatime(path) 返回path所指向的文件或者目录的最后存取时间
os.path.getmtime(path) 返回path所指向的文件或者目录的最后修改时间
更多猛击这里

2.4 sys模块

sys.argv 命令行参数List，第一个元素是程序本身路径
sys.exit(n) 退出程序，正常退出时exit(0)
sys.version 获取Python解释程序的版本信息
sys.maxint 最大的Int值
sys.path 返回模块的搜索路径，初始化时使用PYTHONPATH环境变量的值
sys.platform 返回操作系统平台名称
sys.stdout.write(‘please:‘)
val = sys.stdin.readline()[:-1]

2.5 json & pickle 模块
用于序列化的两个模块

json，用于字符串和 python数据类型间进行转换
pickle，用于python特有的类型和 python的数据类型间进行转换
Json模块提供了四个功能：dumps、dump、loads、load

pickle模块提供了四个功能：dumps、dump、loads、load

2.6 logging模块　　

很多程序都有记录日志的需求，并且日志中包含的信息即有正常的程序访问日志，还可能有错误、警告等信息输出，python的logging模块提供了标准的日志接口，你可以通过它存储各种格式的日志，logging的日志可以分为 debug(), info(), warning(), error() and critical() 5个级别，下面我们看一下怎么用。

最简单用法

import logging

logging.warning("user [alex] attempted wrong password more than 3 times")
logging.critical("server is down")

#输出
WARNING:root:user [alex] attempted wrong password more than 3 times
CRITICAL:root:server is down
看一下这几个日志级别分别代表什么意思

Level When it’s used
DEBUG Detailed information, typically of interest only when diagnosing problems.
INFO Confirmation that things are working as expected.
WARNING An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected.
ERROR Due to a more serious problem, the software has not been able to perform some function.
CRITICAL A serious error, indicating that the program itself may be unable to continue running.
　　

如果想把日志写到文件里，也很简单

import logging

logging.basicConfig(filename=‘example.log‘,level=logging.INFO)
logging.debug(‘This message should go to the log file‘)
logging.info(‘So should this‘)
logging.warning(‘And this, too‘)
其中下面这句中的level=loggin.INFO意思是，把日志纪录级别设置为INFO，也就是说，只有比日志是INFO或比INFO级别更高的日志才会被纪录到文件里，在这个例子，第一条日志是不会被纪录的，如果希望纪录debug的日志，那把日志级别改成DEBUG就行了。

logging.basicConfig(filename=‘example.log‘,level=logging.INFO)
感觉上面的日志格式忘记加上时间啦，日志不知道时间怎么行呢，下面就来加上!

import logging
logging.basicConfig(format=‘%(asctime)s %(message)s‘, datefmt=‘%m/%d/%Y %I:%M:%S %p‘)
logging.warning(‘is when this event was logged.‘)

#输出
12/12/2010 11:46:36 AM is when this event was logged.
　　

如果想同时把log打印在屏幕和文件日志里，就需要了解一点复杂的知识了

The logging library takes a modular approach and offers several categories of components: loggers, handlers, filters, and formatters.

Loggers expose the interface that application code directly uses.
Handlers send the log records (created by loggers) to the appropriate destination.
Filters provide a finer grained facility for determining which log records to output.
Formatters specify the layout of log records in the final output.

import logging

#create logger
logger = logging.getLogger(‘TEST-LOG‘)
logger.setLevel(logging.DEBUG)

# create console handler and set level to debug
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)

# create file handler and set level to warning
fh = logging.FileHandler("access.log")
fh.setLevel(logging.WARNING)
# create formatter
formatter = logging.Formatter(‘%(asctime)s - %(name)s - %(levelname)s - %(message)s‘)

# add formatter to ch and fh
ch.setFormatter(formatter)
fh.setFormatter(formatter)

# add ch and fh to logger
logger.addHandler(ch)
logger.addHandler(fh)

# ‘application‘ code
logger.debug(‘debug message‘)
logger.info(‘info message‘)
logger.warn(‘warn message‘)
logger.error(‘error message‘)
logger.critical(‘critical message‘)

2.7 反射

反射

对于初学python可能较难理解，但反射是非常有用。

试想一下，当别的程序传入给你写的这段代码一个变量(var=“math”)，这个变量是一个字符串，这个字符串是一个模块或者一个模块下的某个方法，你需要通过变量来导入此模块或者方法，如何导入此模块或方法呢，如果直接执行 import var是会出错的，因为var在你的这段代码中是一个变量, 这时就需要反射，如何使用反射呢。

如果这个变量值是一个模块，可以使用MathModule=__import__(var), 导入后，你可以在你的代码中用MathModule.***来调用math模块下的任意方法

当需要获得一个模块下的某个方法时，可以使用getattr，下面有具体的例子。

例子，如何导入通过变量导入math模块

module="math"
MathModule=__import__(module)
print MathModule.pow(2,4)

例子，如何通过变量导入方法，接上边的代码

func="pow"
pow=getattr(MathModule,func)
print pow(2,4)

一个使用反射的具体场景：

假如有服务器A和B，A运行的是集中化任务分发，B接收A给出的任务

A通过queue把任务发送给B，任务内容是让B执行math.pow方法，B去queue中获取任务，此时就必须要使用到反射

在实际应用中，使用的queue应该是消息队列服务器，例如Redis,zeromq等服务器，这里使用python的Queue模块来模拟

技术分享

定义一个队列：

import Queue
queue=Queue.Queue()

定义ServerA

def ServerA():
Dict={‘server‘:‘B‘,‘module‘:‘math‘,‘func‘:‘pow‘}
queue.put(Dict)

运行ServerA函数，将需要执行的任务放入队列中.

ServerA()

定义ServerB，B去任务队列中获取任务：

def ServerB():
task=queue.get()
#首先需要导入module
if task[‘server‘]==‘B‘:
MathModule=__import__(task[‘module‘])
#其次在module中找到task[‘func‘]
func=getattr(MathModule,task[‘func‘])
print func(2,4)

运行ServerB函数, 执行相应的任务.

ServerB()

2.7 re
python中re模块提供了正则表达式相关操作

字符：

　　. 匹配除换行符以外的任意字符
　　\w 匹配字母或数字或下划线或汉字
　　\s 匹配任意的空白符
　　\d 匹配数字
　　\b 匹配单词的开始或结束
　　^ 匹配字符串的开始
　　$ 匹配字符串的结束

次数：

　　* 重复零次或更多次
　　+ 重复一次或更多次
　　? 重复零次或一次
　　{n} 重复n次
　　{n,} 重复n次或更多次
　　{n,m} 重复n到m次

match

# match，从起始位置开始匹配，匹配成功返回一个对象，未匹配成功返回None

match(pattern, string, flags=0)
# pattern：正则模型
# string ：要匹配的字符串
# falgs ：匹配模式
X VERBOSE Ignore whitespace and comments for nicer looking RE‘s.
I IGNORECASE Perform case-insensitive matching.
M MULTILINE "^" matches the beginning of lines (after a newline)
as well as the string.
"$" matches the end of lines (before a newline) as well
as the end of the string.
S DOTALL "." matches any character at all, including the newline.

A ASCII For string patterns, make \w, \W, \b, \B, \d, \D
match the corresponding ASCII character categories
(rather than the whole Unicode categories, which is the
default).
For bytes patterns, this flag is the only available
behaviour and needn‘t be specified.

L LOCALE Make \w, \W, \b, \B, dependent on the current locale.
U UNICODE For compatibility only. Ignored for string patterns (it
is the default), and forbidden for bytes patterns.

复制代码
# 无分组
r = re.match("h\w+", origin)
print(r.group()) # 获取匹配到的所有结果
print(r.groups()) # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组

# 为何要有分组？提取匹配成功的指定内容（先匹配成功全部正则，再匹配成功的局部内容提取出来）

r = re.match("h(\w+).*(?P<name>\d)$", origin)
print(r.group()) # 获取匹配到的所有结果
print(r.groups()) # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组
复制代码
search

1
2
# search,浏览整个字符串去匹配第一个，未匹配成功返回None
# search(pattern, string, flags=0)

复制代码
# 无分组

r = re.search("a\w+", origin)
print(r.group()) # 获取匹配到的所有结果
print(r.groups()) # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组结果

# 有分组

r = re.search("a(\w+).*(?P<name>\d)$", origin)
print(r.group()) # 获取匹配到的所有结果
print(r.groups()) # 获取模型中匹配到的分组结果
print(r.groupdict()) # 获取模型中匹配到的分组中所有执行了key的组
复制代码
findall

# findall，获取非重复的匹配列表；如果有一个组则以列表形式返回，且每一个匹配均是字符串；如果模型中有多个组，则以列表形式返回，且每一个匹配均是元祖；
# 空的匹配也会包含在结果中
#findall(pattern, string, flags=0)

复制代码
# 无分组
r = re.findall("a\w+",origin)
print(r)

# 有分组
origin = "hello alex bcd abcd lge acd 19"
r = re.findall("a((\w*)c)(d)", origin)
print(r)
复制代码
sub

# sub，替换匹配成功的指定位置字符串

sub(pattern, repl, string, count=0, flags=0)
# pattern：正则模型
# repl ：要替换的字符串或可执行对象
# string ：要匹配的字符串
# count ：指定匹配个数
# flags ：匹配模式
Demo
split
split，根据正则匹配分割字符串

split(pattern, string, maxsplit=0, flags=0)
# pattern：正则模型
# string ：要匹配的字符串
# maxsplit：指定分割个数
# flags ：匹配模式

复制代码
# 无分组
origin = "hello alex bcd alex lge alex acd 19"
r = re.split("alex", origin, 1)
print(r)

# 有分组

origin = "hello alex bcd alex lge alex acd 19"
r1 = re.split("(alex)", origin, 1)
print(r1)
r2 = re.split("(al(ex))", origin, 1)
print(r2)
复制代码

IP：
^(25[0-5]|2[0-4]\d|[0-1]?\d?\d)(\.(25[0-5]|2[0-4]\d|[0-1]?\d?\d)){3}$
手机号：
^1[3|4|5|8][0-9]\d{8}$
邮箱：
[a-zA-Z0-9_-][email protected][a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+

以上是关于python学习之深入的主要内容，如果未能解决你的问题，请参考以下文章

python学习之深入

python学习之机器学习

Python 学习之《Learn Python3 The Hard Way 》第五部分学习笔记

Python 学习之《Learn Python3 The Hard Way 》第二部分学习笔记

Python 学习之《Learn Python3 The Hard Way 》第八部分学习笔记

Python 学习之《Learn Python3 The Hard Way 》第七部分学习笔记