python多线程
Posted -柚子皮-
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了python多线程相关的知识,希望对你有一定的参考价值。
http://blog.csdn.net/pipisorry/article/details/45306973
CPU-bound(计算密集型) 和I/O bound(I/O密集型)
I/O bound 指的是系统的CPU效能相对硬盘/内存的效能要好很多,此时,系统运作,大部分的状况是 CPU 在等 I/O (硬盘/内存) 的读/写,此时 CPU Loading 不高。
CPU bound 指的是系统的 硬盘/内存 效能 相对 CPU 的效能 要好很多,此时,系统运作,大部分的状况是 CPU Loading 100%,CPU 要读/写 I/O (硬盘/内存),I/O在很短的时间就可以完成,而 CPU 还有许多运算要处理,CPU Loading 很高。
计算密集型 (CPU-bound)
在多重程序系统中,大部份时间用来做计算、逻辑判断等CPU动作的程序称之CPU bound。例如一个计算圆周率至小数点一千位以下的程序,在执行的过程当中绝大部份时间用在三角函数和开根号的计算,便是属于CPU bound的程序。
It is because the performance characteristic of most protocol codec implementations is CPU-bound, which is the same with I/O processor threads.
根据以上分析,可以认为通常情况下,大部分程序针对某个特定的性能metric而言都可分为CPU bound 和 I/O bound两类。
CPU bound的程序一般而言CPU占用率相当高。这可能是因为任务本身不太需要访问I/O设备,也可能是因为程序是多线程实现因此屏蔽掉了等待I/O的时间。
而I/O bound的程序一般在达到性能极限时,CPU占用率仍然较低。这可能是因为任务本身需要大量I/O操作,而pipeline做得不是很好,没有充分利用处理器能力;还可能是因为数据局部性不是很好,导致较多page error,结果产生了大量disk I/O的开销。可能性很多。
如何确定是CPU bound 还是 I/O bound
一般用top先看达到性能极限时的CPU占用率,然后用sar,iostat等获得具体的i/o操作或是page error的统计数据,如果还需要更精准的信息,例如确定具体是哪些代码产生了这些开销,则要用到oprofile或vtune了。
通常I/O bound的程序包括web server的静态页面访问,或者是基于数据库的一些应用等。而大量计算型的应用都属于CPU bound吧。
在一个系统里CPU bound的程序和I/O bound的程序一起run会怎么样?
应该是CPU bound的程序对CPU的占用率会非常不公平地接近100%吧。因为I/O bound的程序可能一个时间片还没用完就block了,放弃CPU了。而CPU bound的程序因此而得到了很多调度机会并且每次都能把CPU run完。故在这样的系统里要给I/O bound的程序更高的优先级使其能被调度得更多些。
一般计算(CPU)密集型任务适合多进程,IO密集型任务适合多线程,视具体情况而定,如http请求等等待时间较长的情况就属于IO密集型,让开销更小的线程去等待。
python multiprocessing模块
multiprocessing is a package that supports spawning processes using anAPI similar to the threading module. The multiprocessing packageoffers both local and remote concurrency, effectively side-stepping theGlobal Interpreter Lock by using subprocesses instead of threads. Dueto this, the multiprocessing module allows the programmer to fullyleverage multiple processors on a given machine. It runs on both Unix andWindows.
The multiprocessing module also introduces APIs which do not haveanalogs in the threading module. A prime example of this is thePool object which offers a convenient means ofparallelizing the execution of a function across multiple input values,distributing the input data across processes (data parallelism).
process类
class multiprocessing.Process(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None)
#target:指定进程执行的函数,args:该函数的参数,需要使用tuple
返回的应该是Process创建的新进程。
The Process class has equivalents of all the methods of threading.Thread.
In multiprocessing, processes are spawned by creating a Processobject and then calling its start() method. Processfollows the API of threading.Thread. A trivial example of amultiprocess program is
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
run()函数
Method representing the process’s activity.如果在创建Process对象的时候不指定target,那么就会默认执行Process的run方法
You may override this method in a subclass. The standard run()method invokes the callable object passed to the object’s constructor asthe target argument, if any, with sequential and keyword arguments takenfrom the args and kwargs arguments, respectively.
def r():
print 'run method'
#没有指定Process的targt
p1 = Process()
#如果在创建Process时不指定target,那么执行时没有任何效果。因为默认的run方法是判断如果不指定target,那就什么都不做。所以这里手动改变了run方法
p1.run = r
start() 函数
通过调用start方法启动进程,跟线程差不多。
Start the process’s activity.This must be called at most once per process object. It arranges for theobject’s run() method to be invoked in a separate process.
join([timeout])函数
阻塞当前进程,直到调用join方法的那个进程执行完,再继续执行当前进程。join是用来阻塞当前线程的。
防止主进程执行完了,子进程还没结束,这样子进程被迫结束?
If the optional argument timeout is None (the default), the methodblocks until the process whose join() method is called terminates.If timeout is a positive number, it blocks at most timeout seconds.Note that the method returns None if its process terminates or if themethod times out. Check the process’s exitcode to determine ifit terminated.
A process can be joined many times.A process cannot join itself because this would cause a deadlock. It isan error to attempt to join a process before it has been started.
[python多进程的理解 multiprocessing Process join run]
multiprocessing.dummy.Pool进程池及线程池
One can create a pool of processes which will carry out tasks submitted to it with the Pool
class.
map
(func, iterable[, chunksize])
Return an iterator that applies function to every item of iterable, yielding the results. A parallel equivalent of the map()
built-in function (it supports only one iterable argument though). It blocks until the result is ready.
This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting chunksize to a positive integer.
用法示例
from multiprocessing.dummy import Pool as ThreadPool # 线程池
pool=ThreadPool(3) #线程数。默认启动的进程/线程数都为CPU数,如果获取不到CPU数则默认为1
def run(num):
print num**2
return num**2
num_list=[1,2,3]
result = pool.map(run,num_list) #运行,返回可迭代对象,每个元素是每个线程返回的结果
pool.close() #关闭线程/进程池,使其不能再往里面添加线程/进程。
#close()跟terminate()的区别在于close()会等待池中的worker进程执行结束再关闭pool,而terminate()则是直接关闭。
pool.join() #是用来等待进程池中的worker进程执行完毕,防止主进程在worker进程结束前结束。
with Pool(num_parallel_calls) as pool: result_list = pool.map(bert_encode, list(zip_longest(*[iter(train_data_words)] * batch_size, fillvalue=None)))
from multiprocessing import Pool # 进程池
import time
def f(x):
return x*x
if __name__ == '__main__':
with Pool(processes=4) as pool: # start 4 worker processes
result = pool.apply_async(f, (10,)) # evaluate "f(10)" asynchronously in a single process
print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow
print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]"
it = pool.imap(f, range(10))
print(next(it)) # prints "0"
print(next(it)) # prints "1"
print(it.next(timeout=1)) # prints "4" unless your computer is *very* slow
result = pool.apply_async(time.sleep, (10,))
print(result.get(timeout=1)) # raises multiprocessing.TimeoutError
[17.2.2.9. Process Pools¶]
from:http://blog.csdn.net/pipisorry/article/details/45306973
[Parallelism and Serialization]
PicklingError:python 进程池3
Pickle issue with LdaMulticore
以上是关于python多线程的主要内容,如果未能解决你的问题,请参考以下文章