Python,内存错误,csv文件太大[重复]
Posted
技术标签:
【中文标题】Python,内存错误,csv文件太大[重复]【英文标题】:Python, memory error, csv file too large [duplicate] 【发布时间】:2014-03-02 17:11:48 【问题描述】:我的 python 模块有问题,无法处理导入大数据文件(文件 targets.csv 的权重接近 1 Gb)
加载此行时出现错误:
targets = [(name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
追溯:
Traceback (most recent call last):
File "C:\Users\gary\Documents\EPSON STUDIES\colors_text_D65.py", line 41, in <module>
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
MemoryError
我想知道是否有办法逐行打开文件targets.csv?还想知道这会减慢进程吗?
这个模块已经很慢了...
谢谢!
import geometry
import csv
import numpy as np
import random
import cv2
S = 0
img = cv2.imread("MAP.tif", -1)
height, width = img.shape
pixx = height * width
iterr = float(pixx / 1000)
accomplished = 0
temp = 0
ppm = file("epson gamut.ppm", 'w')
ppm.write("P3" + "\n" + str(width) + " " + str(height) + "\n" + "255" + "\n")
# PPM file header
all_colors = [(name, float(X), float(Y), float(Z))
for name, X, Y, Z in csv.reader(open('XYZcolorlist_D65.csv'))]
# background is marked SUPPORT
support_i = [i for i, color in enumerate(all_colors) if color[0] == '255 255 255']
if len(support_i)>0:
support = np.array(all_colors[support_i[0]][1:])
del all_colors[support_i[0]]
else:
support = None
tg, hull_i = geometry.tetgen_of_hull([(X,Y,Z) for name, X, Y, Z in all_colors])
colors = [all_colors[i] for i in hull_i]
print ("thrown out: "
+ ", ".join(set(zip(*all_colors)[0]).difference(zip(*colors)[0])))
targets = [(name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv'))]
for target in targets:
name, X, Y, Z, BG = target
target_point = support + (np.array([X,Y,Z]) - support)/(1-BG)
tet_i, bcoords = geometry.containing_tet(tg, target_point)
if tet_i == None:
#print str("out")
ppm.write(str("255 255 255") + "\n")
print "out"
temp += 1
if temp >= iterr:
accomplished += temp
print str(100 * accomplished / (float(pixx))) + str(" %")
temp = 0
continue
# not in gamut
else:
A = bcoords[0]
B = bcoords[1]
C = bcoords[2]
D = bcoords[3]
R = random.uniform(0,1)
names = [colors[i][0] for i in tg.tets[tet_i]]
if R <= A:
S = names[0]
elif R <= A+B:
S = names[1]
elif R <= A+B+C:
S = names[2]
else:
S = names[3]
ppm.write(str(S) + "\n")
temp += 1
if temp >= iterr:
accomplished += temp
print str(100 * accomplished / (float(pixx))) + str(" %")
temp = 0
print "done"
ppm.close()
【问题讨论】:
【参考方案1】:csv.reader()
已经一次读取一行。但是,您首先将所有行收集到一个列表中。您应该一次处理一行。一种方法是切换到生成器,例如:
targets = ((name, float(X), float(Y), float(Z), float(BG))
for name, X, Y, Z, BG in csv.reader(open('targets.csv')))
(从方括号切换到括号应该将 target
从列表解析更改为生成器。)
【讨论】:
以上是关于Python,内存错误,csv文件太大[重复]的主要内容,如果未能解决你的问题,请参考以下文章
beyond compare解决特殊字符无法输出多sheet页无法对比以及文件太大超出系统内存问题的Excel转txt脚本
php - 致命错误:允许的内存大小为 134217728 字节已用尽 [重复]
大型 CSV 文件 (numpy) 上的 Python 内存不足
Pandas - 导入大小为 4GB 的 CSV 文件时出现内存错误