Python:使用多线程修改pandas DataFrame时,Spyder会发生错误
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了Python:使用多线程修改pandas DataFrame时,Spyder会发生错误相关的知识,希望对你有一定的参考价值。
我有一个大型数据框和一列“图像”,“图像”中的数据是大量文件的文件名(扩展名等于“jpg”或“jpeg”)。某些文件存在右扩展名,但其他文件没有。因此,我必须检查“图像”数据是否正确,但单线程需要30秒,然后我决定使用多线程执行此操作。
我已经用Python(3.6.5)编写了一个代码来检查它,当我在命令行上执行它时运行良好,但是当我在Spyder(3.2.8)上执行它时发生错误,我该怎么做才能避免这种情况?
这是我的代码:
# -*- coding: utf-8 -*-
import multiprocessing
import numpy as np
import os
import pandas as pd
from multiprocessing import Pool
#some large scale DataFrame, the size is about (600, 15)
waferDf = pd.DataFrame({"image": ["aaa.jpg", "bbb.jpeg", "ccc.jpg", "ddd.jpeg", "eee.jpg", "fff.jpg", "ggg.jpeg", "hhh.jpg"]})
waferDf["imagePath"] = np.nan
#to parallelize whole process
def parallelize(func, df, uploadedDirPath):
partitionCount = multiprocessing.cpu_count()
partitions = np.array_split(df, partitionCount)
paras = [(part, uploadedDirPath) for part in partitions]
pool = Pool(partitionCount)
df = pd.concat(pool.starmap(func, paras))
pool.close()
pool.join()
return df
#check whether files exist
def checkImagePath(partialDf, uploadedDirPath):
for index in partialDf.index.values:
print(index)
if os.path.exists(os.path.join(uploadedDirPath, partialDf.loc[index, ["image"]][0].replace(".jpeg
", ".jpeg"))):
partialDf.loc[index, ["imagePath"]][0] = os.path.join(uploadedDirPath, partialDf.loc[index, ["image"]][0].replace(".jpeg
", ".jpeg"))
elif os.path.exists(os.path.join(uploadedDirPath, partialDf.loc[index, ["image"]][0].replace(".jpeg
", ".jpg"))):
partialDf.loc[index, ["imagePath"]][0] = os.path.join(uploadedDirPath, partialDf.loc[index, ["image"]][0].replace(".jpeg
", ".jpg"))
print(partialDf)
return partialDf
if __name__ == '__main__':
waferDf = parallelize(checkImagePath, waferDf, "/eap/uploadedFiles/")
print(waferDf)
这是错误:
runfile('C:/Users/00048564/Desktop/Multi-Threading.py', wdir='C:/Users/00048564/Desktop')
Traceback (most recent call last):
File "<ipython-input-24-732edc0ea3ea>", line 1, in <module>
runfile('C:/Users/00048564/Desktop/Multi-Threading.py', wdir='C:/Users/00048564/Desktop')
File "C:ProgramDataAnaconda3libsite-packagesspyderutilssitesitecustomize.py", line 705, in runfile
execfile(filename, namespace)
File "C:ProgramDataAnaconda3libsite-packagesspyderutilssitesitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "C:/Users/00048564/Desktop/Multi-Threading.py", line 35, in <module>
waferDf = parallelize(checkImagePath, waferDf, "/eap/uploadedFiles/")
File "C:/Users/00048564/Desktop/Multi-Threading.py", line 17, in parallelize
pool = Pool(partitionCount)
File "C:ProgramDataAnaconda3libmultiprocessingcontext.py", line 119, in Pool
context=self.get_context())
File "C:ProgramDataAnaconda3libmultiprocessingpool.py", line 174, in __init__
self._repopulate_pool()
File "C:ProgramDataAnaconda3libmultiprocessingpool.py", line 239, in _repopulate_pool
w.start()
File "C:ProgramDataAnaconda3libmultiprocessingprocess.py", line 105, in start
self._popen = self._Popen(self)
File "C:ProgramDataAnaconda3libmultiprocessingcontext.py", line 322, in _Popen
return Popen(process_obj)
File "C:ProgramDataAnaconda3libmultiprocessingpopen_spawn_win32.py", line 33, in __init__
prep_data = spawn.get_preparation_data(process_obj._name)
File "C:ProgramDataAnaconda3libmultiprocessingspawn.py", line 172, in get_preparation_data
main_mod_name = getattr(main_module.__spec__, "name", None)
AttributeError: module '__main__' has no attribute '__spec__'
答案
在大多数情况下,当您通过调用关键字python'YourFile.py'从命令行运行python脚本时,脚本将作为主程序执行。因此,它能够调用所需的模块,如多处理和错误跟踪中显示的其他模块。
但是,您的Spyder配置可能不同,并且您将脚本作为主程序运行的指令不起作用。
你是否能够成功运行Spyder的任何脚本
if __name__ == '__main__':
阅读此主题https://stackoverflow.com/a/419185/9968677上接受的答案
以上是关于Python:使用多线程修改pandas DataFrame时,Spyder会发生错误的主要内容,如果未能解决你的问题,请参考以下文章