Django prefetch_related 一个大型数据集
Posted
技术标签:
【中文标题】Django prefetch_related 一个大型数据集【英文标题】:Django prefetch_related a large dataset 【发布时间】:2017-05-12 18:28:52 【问题描述】:我现在遇到与 django 的预取相关的问题。 举个例子,让我们想象一下那些模型
from django.db import models
class Client(models.Model):
name = models.CharField(max_length=255)
class Purchase(models.Model):
client = models.ForeignKey('Client')
假设我们有几个客户,大概有 200 个,但他们购买了很多,所以我们有数百万次购买。
如果我必须创建一个网页来显示所有客户以及每个客户的购买数量,我将不得不编写类似的内容
from django.db.models import Prefetch
from .models import Purchase, Client
purchases = Purchase.objects.all()
clients = Client.prefetch_related(Prefetch('purchase_set', queryset=purchases))
这里的问题是我将查询大宗采购数据库,而该查询可能需要一分钟以上,或者更糟的是在服务器上创建一个 MemoryError。
所以,我尝试只选择该数据库的一批
purchases = Purchase.objects.all()[:9]
但正如我们所料,Django 不太喜欢它并启动了这种异常
Traceback (most recent call last):
File "project/venv/lib/python3.6/site-packages/django/core/handlers/base.py",
line 149, in get_response
response = self.process_exception_by_middleware(e, request)
File "project/venv/lib/python3.6/site-packages/django/core/handlers/base.py",
line 147, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "project/venv/lib/python3.6/site-packages/django/views/generic/base.py",
line 68, in view
return self.dispatch(request, *args, **kwargs)
File "project/venv/lib/python3.6/site-packages/django/utils/decorators.py", l
ine 67, in _wrapper
return bound_func(*args, **kwargs)
File "project/venv/lib/python3.6/site-packages/django/views/decorators/cache.
py", line 57, in _wrapped_view_func
response = view_func(request, *args, **kwargs)
File "project/venv/lib/python3.6/site-packages/django/utils/decorators.py", l
ine 63, in bound_func
return func.__get__(self, type(self))(*args2, **kwargs2)
****************** login decorators, views, ...
File "project/***.py", line ***, in ***
for client in clients:
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 258, in __iter__
self._fetch_all()
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 1076, in _fetch_all
self._prefetch_related_objects()
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 656, in _prefetch_related_objects
prefetch_related_objects(self._result_cache, self._prefetch_related_lookups)
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 1457, in prefetch_related_objects
obj_list, additional_lookups = prefetch_one_level(obj_list, prefetcher, lookup, level)
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 1556, in prefetch_one_level
prefetcher.get_prefetch_queryset(instances, lookup.get_current_queryset(level)))
File "project/venv/lib/python3.6/site-packages/django/db/models/fields/relate
d_descriptors.py", line 539, in get_prefetch_queryset
queryset = queryset.filter(**query)
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 790, in filter
return self._filter_or_exclude(False, *args, **kwargs)
File "project/venv/lib/python3.6/site-packages/django/db/models/query.py", li
ne 802, in _filter_or_exclude
"Cannot filter a query once a slice has been taken."
AssertionError: Cannot filter a query once a slice has been taken.
所以现在,我没有真正的解决方案。我正在查看 django/db/models/query.py:258 中的 __iter__ 函数是如何构建的,以尝试创建具有相同行为但需要在预取中设置有限集才能对其进行分页并执行操作的函数一种更并行的方式。
有什么“好方法”来做这些查询吗?
【问题讨论】:
【参考方案1】:假设我们有几个客户,大概有 200 个,但他们购买 很多,所以我们有数百万次购买。
如果我必须创建一个网页来显示所有客户端和 每个客户的购买次数,...
我会将您的问题解释为需要此功能。你试过了吗:
from django.db.models import Count
clients = Client.objects.annotate(num_purchases=Count('purchase'))
clients[0].num_purchases
如果您想排序并获得最高购买量的客户,您也可以这样做:
clients = Client.objects.annotate(num_purchases=Count('purchase')).order_by('-num_purchases')[:5]
更多功能请参见https://docs.djangoproject.com/en/1.11/topics/db/aggregation/。
【讨论】:
以上是关于Django prefetch_related 一个大型数据集的主要内容,如果未能解决你的问题,请参考以下文章
Django:prefetch_related() 是不是遵循反向关系查找?
prefetch_related 上的 Django ORM 注释
转 实例详解Django的 select_related 和 prefetch_related 函数对 QuerySet 查询的优化