在 Django 中流式传输 CSV 文件
Posted
技术标签:
【中文标题】在 Django 中流式传输 CSV 文件【英文标题】:Streaming a CSV file in Django 【发布时间】:2011-07-06 00:22:37 【问题描述】:我正在尝试将 csv 文件作为附件下载流式传输。 CSV 文件的大小将达到 4MB 或更大,我需要一种方法让用户主动下载文件,而无需等待所有数据被创建并首先提交到内存。
我首先使用了我自己的基于 Django 的 FileWrapper
类的文件包装器。那失败了。然后我在这里看到了一种使用生成器流式传输响应的方法:
How to stream an HttpResponse with Django
当我在生成器中引发错误时,我可以看到我正在使用 get_row_data()
函数创建正确的数据,但是当我尝试返回响应时,它返回为空。我还禁用了 Django GZipMiddleware
。有谁知道我做错了什么?
编辑:我遇到的问题是ConditionalGetMiddleware
。我不得不替换它,代码在下面的答案中。
这里是视图:
from django.views.decorators.http import condition
@condition(etag_func=None)
def csv_view(request, app_label, model_name):
""" Based on the filters in the query, return a csv file for the given model """
#Get the model
model = models.get_model(app_label, model_name)
#if there are filters in the query
if request.method == 'GET':
#if the query is not empty
if request.META['QUERY_STRING'] != None:
keyword_arg_dict =
for key, value in request.GET.items():
#get the query filters
keyword_arg_dict[str(key)] = str(value)
#generate a list of row objects, based on the filters
objects_list = model.objects.filter(**keyword_arg_dict)
else:
#get all the model's objects
objects_list = model.objects.all()
else:
#get all the model's objects
objects_list = model.objects.all()
#create the reponse object with a csv mimetype
response = HttpResponse(
stream_response_generator(model, objects_list),
mimetype='text/plain',
)
response['Content-Disposition'] = "attachment; filename=foo.csv"
return response
这是我用来流式传输响应的生成器:
def stream_response_generator(model, objects_list):
"""Streaming function to return data iteratively """
for row_item in objects_list:
yield get_row_data(model, row_item)
time.sleep(1)
这是我创建 csv 行数据的方法:
def get_row_data(model, row):
"""Get a row of csv data from an object"""
#Create a temporary csv handle
csv_handle = cStringIO.StringIO()
#create the csv output object
csv_output = csv.writer(csv_handle)
value_list = []
for field in model._meta.fields:
#if the field is a related field (ForeignKey, ManyToMany, OneToOne)
if isinstance(field, RelatedField):
#get the related model from the field object
related_model = field.rel.to
for key in row.__dict__.keys():
#find the field in the row that matches the related field
if key.startswith(field.name):
#Get the unicode version of the row in the related model, based on the id
try:
entry = related_model.objects.get(
id__exact=int(row.__dict__[key]),
)
except:
pass
else:
value = entry.__unicode__().encode("utf-8")
break
#if it isn't a related field
else:
#get the value of the field
if isinstance(row.__dict__[field.name], basestring):
value = row.__dict__[field.name].encode("utf-8")
else:
value = row.__dict__[field.name]
value_list.append(value)
#add the row of csv values to the csv file
csv_output.writerow(value_list)
#Return the string value of the csv output
return csv_handle.getvalue()
【问题讨论】:
【参考方案1】:这里有一些可以流式传输 CSV 的简单代码;你可能可以从这里开始你需要做的任何事情:
import cStringIO as StringIO
import csv
def csv(request):
def data():
for i in xrange(10):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
csvwriter.writerow([i,"a","b","c"])
yield csvfile.getvalue()
response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response
这只是将每一行写入内存文件,读取该行并生成它。
此版本生成批量数据效率更高,但使用前请务必了解以上内容:
import cStringIO as StringIO
import csv
def csv(request):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
def read_and_flush():
csvfile.seek(0)
data = csvfile.read()
csvfile.seek(0)
csvfile.truncate()
return data
def data():
for i in xrange(10):
csvwriter.writerow([i,"a","b","c"])
data = read_and_flush()
yield data
response = HttpResponse(data(), mimetype="text/csv")
response["Content-Disposition"] = "attachment; filename=test.csv"
return response
【讨论】:
我还没有需要流式传输数据,但很高兴知道获得简单而优雅的东西有多快。 虽然我真的很喜欢这个答案,但事实证明这不是我的问题。我真的使用了你写的这个确切的代码,只是为了看看它是否会产生一个响应,但响应返回为 0 字节。所以我仍然坚持同样的结果。 此代码运行良好,因此您的环境有问题需要排除故障。 看来禁用 ConditionalGetMiddleware 实际上会允许发回响应。不过,我真的更愿意启用该中间件。有没有办法使用生成器并保持该中间件处于启用状态? 此解决方案的更新是使用 Django 1.5 中的新 StreamingHttpResponse。 :)【参考方案2】:从 Django 1.5 开始,中间件问题已得到解决,并引入了 StreamingHttpResponse。应该这样做:
import cStringIO as StringIO
import csv
def csv_view(request):
...
# Assume `rows` is an iterator or lists
def stream():
buffer_ = StringIO.StringIO()
writer = csv.writer(buffer_)
for row in rows:
writer.writerow(row)
buffer_.seek(0)
data = buffer_.read()
buffer_.seek(0)
buffer_.truncate()
yield data
response = StreamingHttpResponse(
stream(), content_type='text/csv'
)
disposition = "attachment; filename=file.csv"
response['Content-Disposition'] = disposition
return response
how to output csv from Django 上有一些文档,但它没有利用 StreamingHttpResponse
,所以我继续使用 opened a ticket in order to track it。
【讨论】:
【参考方案3】:我遇到的问题是 ConditionalGetMiddleware。我看到 django-piston 为 ConditionalGetMiddleware 提供了一个替代中间件,它允许流式传输:
from django.middleware.http import ConditionalGetMiddleware
def compat_middleware_factory(klass):
"""
Class wrapper that only executes `process_response`
if `streaming` is not set on the `HttpResponse` object.
Django has a bad habbit of looking at the content,
which will prematurely exhaust the data source if we're
using generators or buffers.
"""
class compatwrapper(klass):
def process_response(self, req, resp):
if not hasattr(resp, 'streaming'):
return klass.process_response(self, req, resp)
return resp
return compatwrapper
ConditionalMiddlewareCompatProxy = compat_middleware_factory(ConditionalGetMiddleware)
那么您将用您的 ConditionalMiddlewareCompatProxy 中间件替换 ConditionalGetMiddleware,并且在您看来(从一个聪明的答案中借用代码):
def csv_view(request):
def data():
for i in xrange(10):
csvfile = StringIO.StringIO()
csvwriter = csv.writer(csvfile)
csvwriter.writerow([i,"a","b","c"])
yield csvfile.getvalue()
#create the reponse object with a csv mimetype
response = HttpResponse(
data(),
mimetype='text/csv',
)
#Set the response as an attachment with a filename
response['Content-Disposition'] = "attachment; filename=test.csv"
response.streaming = True
return response
【讨论】:
以上是关于在 Django 中流式传输 CSV 文件的主要内容,如果未能解决你的问题,请参考以下文章
read_csv() 中的 S3 阅读器是先将文件下载到磁盘还是使用流式传输?
Groovy Grails,如何在控制器的响应中流式传输或缓冲大文件?