当数据非常嵌套时如何使用 $gt 聚合文档和聚合 Pymongo

Posted 2023-03-16

技术标签:

【中文标题】当数据非常嵌套时如何使用 $gt 聚合文档和聚合 Pymongo【英文标题】：How to aggregate documents using $gt and aggregate Pymongo when the data is very nested 【发布时间】：2021-04-11 02:03:48 【问题描述】：

我的数据在集合collec 中，然后数据非常嵌套。我想在我的db 中获取所有数据，它有一个confidence> 0.33。

我写了这样一个查询：

returned_data = 
for q in collec.find("question": "$in":all_question_ids)):
    if q['response']['detection']['confidence'] >= 0.3:
        returned_data[q['id1']] = q['response']['detection']['confidence']

这花费了太多时间，我认为它正在单独查找每个条目。

如何使用aggregate 和$gt 来获得结果？

【问题讨论】：

【参考方案1】：

您可以使用聚合查询通过管道（即一系列操作）提取相关数据。首先，您将根据大于 0.3 的置信度匹配文档，然后您可以投影匹配的文档，以便只返回必填字段。

pipeline = [
    '$match': 'response.detection.confidence': '$gt': 0.3,
    '$project': 'question_id': '$id1',
                  'confidence': '$response.detection.confidence'
    ]

cursor = collec.aggregate(pipeline, useCursor=True, batchSize=50000) 
result = list(cursor)

【讨论】：

这里的$id1 是什么？此外，您有缩进错误，这不是什么大问题，但即使我使用 .limit(10)，我的计算机在过去 2 小时内一直在处理此查询 all_question_ids 是我要查找的所有问题 ID 的列表。 [ObjectId('28_char_alphanum_id')],......] 根据您的查询，我认为 $id1 是 question_id 的键 - 这是错误的吗？回复。 all_question_ids 您可以使用$and 匹配多个条件。例如 - 将我的代码中的 $match 语句替换为 '$match': '$and': ['response.detection.confidence': '$gt': 0.3, "question": "$in":all_question_ids]

以上是关于当数据非常嵌套时如何使用 $gt 聚合文档和聚合 Pymongo的主要内容，如果未能解决你的问题，请参考以下文章