如何优化 FOR 循环下存在的数据库调用（平台：Django、Python、Postgres）[重复]

Posted 2023-04-15

技术标签:

【中文标题】如何优化 FOR 循环下存在的数据库调用（平台：Django、Python、Postgres）[重复]【英文标题】：How to optimize a DB call existing under a FOR loop (platform: Django, Python, Postgres) [duplicate] 【发布时间】：2015-10-24 00:59:52 【问题描述】：

我正在优化我维护的 Django 网站，以提高性能。除其他外，我犯了在 FOR 循环下包含非平凡的 db 调用的错误。更好的做法是进行一次数据库调用，然后尽可能多地循环数据。如何在下面的代码中实现这一点？这里需要一个小提示！

link_ids = [link.id for link in context["object_list"]]
seen_replies = Publicreply.objects.filter(answer_to_id__in=link_ids,publicreply_seen_related__seen_user = user)
for link in context["object_list"]:
        try:
            latest_reply = link.publicreply_set.latest('submitted_on')
            if latest_reply in seen_replies:
               #do something
        except:
            pass

本质上，分析告诉我latest_reply = link.publicreply_set.latest('submitted_on') 行正在增加大量开销，因为它正在执行大量的数据库查询（在 FOR 循环下）。

我似乎无法找到一种很好、干净的方法来将调用移到循环之外，然后在其中处理它的成分。有人有什么想法吗？

注意： link.publicreply_set.latest('submitted_on') 可能会产生 DoesNotExist。我在生产中使用 Postgres，但在本地使用 SQLite。

型号是：

class Link(models.Model):
    description = models.TextField(validators=[MaxLengthValidator(500)])
    submitter = models.ForeignKey(User)
    submitted_on = models.DateTimeField(auto_now_add=True)

class Publicreply(models.Model):
    submitted_by = models.ForeignKey(User)
    answer_to = models.ForeignKey(Link)
    submitted_on = models.DateTimeField(auto_now_add=True)
    description = models.TextField(validators=[MaxLengthValidator(250)])

class Seen(models.Model):
    seen_status = models.BooleanField(default=False)
    seen_user = models.ForeignKey(User)
    seen_at = models.DateTimeField(auto_now_add=True)
    which_reply = models.ForeignKey(Publicreply, related_name="publicreply_seen_related")

【问题讨论】：

@e4c5：添加了模型。你怎么看？ 【参考方案1】：

您可以利用postgresql window function first_value。这是一种提高性能的方法：

link_ids = [link.id for link in context["object_list"]]
seen_replies = (   Publicreply
                   .objects
                   .filter(answer_to_id__in=link_ids,
                           publicreply_seen_related__seen_user = user)
                   )

latest_replies = dict ( seen_replies
                       .extra( 
                         select= 
                            "id_latest_reply": 
                            """first_value(id) over 
                                  (partition by answer_to_id 
                                   order by submitted_on desc)
                            """  )
                        .order_by()
                        .values_list( 'answer_to_id', 'id' )                        
                        .distinct()
                       )

for link in context["object_list"]:
        try:
            if link.id in latest_replies:
               last_replie= (Publicreply
                            .objects
                            .get(id=latest_reply[link.id])
                            )
               #do something
        except:
            pass

【讨论】：

澄清一下，这不是与数据库无关的权利，因此例如我在本地安装了 SQLite - 我将无法在本地环境中对其进行测试？当然。您将问题标记为 postgesql 并且没有提及不可知论方法。此外，最重要的是，您将问题标记为查询优化，了解此标记的含义。根据您的新要求发布一个新问题。我的生产环境确实是Postgres，但我至少需要能够在本地测试解决方案才能直接部署，你不觉得吗？我想是时候在我的本地环境中安装 postgres 了你的Q是关于sql查询优化的，我的A是关于它的。我没有更多的cmets，对不起。如果您的 Q 有错误，请发布具有正确标签和要求的新 Q。周末快乐；）

以上是关于如何优化 FOR 循环下存在的数据库调用（平台：Django、Python、Postgres）[重复]的主要内容，如果未能解决你的问题，请参考以下文章