如何使用 mongo 搜索集合并返回子文档列表（Spring-data-mongo）

Posted 2023-02-16

技术标签:

【中文标题】如何使用 mongo 搜索集合并返回子文档列表（Spring-data-mongo）【英文标题】：How to search a collection and return a list of sub document with mongo (Sping-data-mongo) 【发布时间】：2022-01-14 17:11:00 【问题描述】：

鉴于此文档集合（工作流程）：

[
 
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task1', value:'new'
  taskId: 'task2', value:'started'
  taskId: 'task3', value:'completed'
 ]
,
 
 id: 2,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task1', value:'new'
  taskId: 'task2', value:'started'
  taskId: 'task3', value:'completed'
 ]
,
 
 id: 3,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task1', value:'new'
  taskId: 'task2', value:'started'
  taskId: 'task3', value:'completed'
 ]

]

我已经有一个搜索功能，它使用 Query 和 mongoTemplate.find() 向我返回与一堆条件匹配的工作流列表（页面）；

我需要做的是把这个结果变成这样的：（假设查询返回所有元素

[

  
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task1', value:'new'
 ]
,
  
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task2', value:'started'
 ]
,
  
 id: 1,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task3', value:'completed'
 ]
,
 
 id: 2,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task1', value:'new'
 ]
,
 
 id: 2,
 name: 'workflow',
 status: 'started',
 createdDate: '2021-02-10'
 tasks: [
  taskId: 'task2', value:'started'
 ]
,
.... etc
]

换句话说，我想返回一个扁平化版本的工作流，每个工作流只有 1 个任务。如果可能，可分页！

我可以使用的另一个版本是将带有聚合工作流对象（父）的任务列表返回到添加的字段中，例如：

[
 taskId: 'task1', value:'new', workflow: the workflow object,
 taskId: 'task2', value:'started', workflow: the workflow object,
]

我在 Aggregation 和 unwind 等方面玩了一些，但我是 mongodb 的新手，我找不到对我有帮助的例子。

提前致谢！

更新：

基于此处和其他人的答案。我想出了这个有效的查询，并且完全按照我的意愿行事。：

db.Workflow.aggregate([
  
    $match: 
  ,
  
    $unwind: "$tasks"
  ,
  
    $facet: 
      data: [
        
          $skip: 0
        ,
        
          $limit: 30
        ,
        
      ],
      count: [
        
          $group: 
            _id: null,
            count: 
              $sum: 1
            
          
        ,
        
      ],
      
    
  
])

因此，如果有人可以帮助我在 spring-data 聚合请求中翻译此内容...我很难处理组部分。谢谢

【问题讨论】：

【参考方案1】：

MongoDB 聚合是您所需要的：

db.Workflow.aggregate([
  
    $match:  // put here your search criteria
  ,
  
    $unwind: "$tasks"
  ,
  
    $addFields: 
      tasks: [
        "$tasks"
      ]
    
  ,
  //pageable
  
    $skip: 0
  ,
  
    $limit: 100
  
])

MongoPlayground

SpringBoot方式：

@Autowired
private MongoTemplate mongoTemplate;

...

List<AggregationOperation> pipeline = new ArrayList<>();

//$match (put here your filter)
pipeline.add(Aggregation.match(Criteria.where("status").is("started")));

//$unwind
pipeline.add(Aggregation.unwind("tasks"));

//$addFields
pipeline.add(Aggregation.addFields().addFieldWithValue("tasks", Arrays.asList("$tasks")).build());

//$skip
pipeline.add(Aggregation.skip(0L));
    
//$limit
pipeline.add(Aggregation.limit(100L));

Aggregation agg = Aggregation.newAggregation(pipeline)
    .withOptions(Aggregation
        .newAggregationOptions().allowDiskUse(Boolean.TRUE).build());

return mongoTemplate.aggregate(agg, Workflow.class, Workflow.class).getMappedResults();

【讨论】：

非常好。 MongoPlayground 现已加入书签！！ ;-)。非常有帮助。我将在我的问题中添加一个子问题。如果您知道如何将我的查询转换为弹簧数据！提前致谢【参考方案2】：

所以我会尝试使用示例代码来回答。我使用的是 SpringTemplates 而不是 SpringRepositories。虽然存储库可以进行聚合，但对于模板具有更多控制权的大多数企业应用程序来说，它们从根本上来说太基础了。在我看来，我只会使用模板，永远不会使用存储库 - 但这只是我的看法。

请记住 - SpringData 想要将 POJO 映射到 MongoDB 集合中的数据。查询的响应很容易，因为两者是同步的——POJO 匹配数据库中的预期结构。执行聚合时，由于各种原因，结果通常会被重新整形。

在您的用例中，您似乎想要展开“任务”字段，并且每个更高级别的父对象只有一个任务。这意味着父字段将重复 - 就像您在原始帖子中显示的预期输出一样。执行展开时，数组不再存在，但单个文档在其位置。因此，输出的形状略有不同。对于 Spring，这意味着一个不同的类（继承可以在这里提供帮助）。出于这个原因，在我的示例代码中，我有两个 POJO - 一个名为 Workflow 表示原始保存的文档形状，包括字段 tasks 的数组，另一个名为 Workflow2 的 POJO 表示重新调整的聚合结果。唯一的区别是字段tasks。一个有一个List<Task>，而另一个有一个Task 子对象。

所以，实际上我有 3 个 POJO：

工作流程工作流程2 任务

Task 是一个类，用于定义字段task 中的子文档。无论它是否是一个数组 - 它仍然需要一个类来保存两个子文档字段 taskId 和 value。

我正在使用 maven 进行依赖管理。为了更加清晰，我完全限定了每个没有 import 语句的对象。

所以，不用多说，这里是代码。

文件 pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>2.3.3.RELEASE</version>
        <relativePath/>
    </parent>
    <groupId>test.barry</groupId>
    <artifactId>test</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <name>test</name>
    <description>Demo project for Spring Boot</description>
    <properties>
        <java.version>1.8</java.version>
        <start-class>test.barry.Main</start-class>
        <mongodb.version>4.3.4</mongodb.version> <!-- BARRY NOTE: FORCE SPRING-BOOT TO USE THE MONGODB DRIVER VERSION 4.4.0 INSTEAD OF 4.0.5 -->
    </properties>
    <build>
        <plugins>
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
            </plugin>
        </plugins>
    </build>
    <dependencies>
        <dependency>
            <groupId>org.mongodb</groupId>
            <artifactId>mongodb-driver-sync</artifactId>
            <version>4.3.4</version>
        </dependency>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-data-mongodb</artifactId>
        </dependency>
    </dependencies>
</project>

文件 src/main/resources/application.properties

spring.data.mongodb.uri=mongodb://testuser:mysecret@localhost:50011,localhost:50012,localhost:50013/?replicaSet=replSet&w=majority&readConcernLevel=majority&readPreference=primary&authSource=admin&retryWrites=true&maxPoolSize=10&waitQueueTimeoutMS=1000
spring.data.mongodb.database=javaspringtestX
spring.data.mongodb.socketconnecttimeout=60

文件 src/main/java/test.barry/Main.java

package test.barry;

@org.springframework.boot.autoconfigure.SpringBootApplication
public class Main 
    public static void main(String[] args) 
        org.springframework.boot.SpringApplication.run(Main.class, args);

文件 src/main/java/test.barry/MySpringBootApplication.java

package test.barry;

@org.springframework.boot.autoconfigure.SpringBootApplication
public class MySpringBootApplication implements org.springframework.boot.CommandLineRunner 

  @org.springframework.beans.factory.annotation.Autowired
  org.springframework.data.mongodb.core.MongoTemplate mongoTemplate;

  public static void main(String[] args) 
    org.springframework.boot.SpringApplication.run(org.springframework.boot.autoconfigure.SpringBootApplication.class, args);
  

  @Override
  public void run(String... args) throws Exception 

    System.out.println("Drop collections for automatic cleanup during test:");
    System.out.println("-------------------------------");
    this.mongoTemplate.dropCollection(test.barry.models.Workflow.class);

    java.util.Calendar calendar = java.util.Calendar.getInstance();
    calendar.set(2021, 2, 10);

    test.barry.models.Workflow workflow1 = new test.barry.models.Workflow();
    workflow1.id = 1;
    workflow1.name  = "workflow";
    workflow1.status = "started";
    workflow1.createdDate = calendar.getTime();
    workflow1.tasks.add(new test.barry.models.Task ("task1", "new"));
    workflow1.tasks.add(new test.barry.models.Task ("task2", "started"));
    workflow1.tasks.add(new test.barry.models.Task ("task3", "completed"));

    this.mongoTemplate.save(workflow1);

    test.barry.models.Workflow workflow2 = new test.barry.models.Workflow();
    workflow2.id = 2;
    workflow2.name  = "workflow";
    workflow2.status = "started";
    workflow2.createdDate = calendar.getTime();
    workflow2.tasks.add(new test.barry.models.Task ("task1", "new"));
    workflow2.tasks.add(new test.barry.models.Task ("task2", "started"));
    workflow2.tasks.add(new test.barry.models.Task ("task3", "completed"));

    this.mongoTemplate.save(workflow2);

    test.barry.models.Workflow workflow3 = new test.barry.models.Workflow();
    workflow3.id = 3;
    workflow3.name  = "workflow";
    workflow3.status = "started";
    workflow3.createdDate = calendar.getTime();
    workflow3.tasks.add(new test.barry.models.Task ("task1", "new"));
    workflow3.tasks.add(new test.barry.models.Task ("task2", "started"));
    workflow3.tasks.add(new test.barry.models.Task ("task3", "completed"));

    this.mongoTemplate.save(workflow3);

    org.springframework.data.mongodb.core.aggregation.Aggregation pipeline = org.springframework.data.mongodb.core.aggregation.Aggregation.newAggregation (
            org.springframework.data.mongodb.core.aggregation.Aggregation.unwind("tasks")
    );

    org.springframework.data.mongodb.core.aggregation.AggregationResults<test.barry.models.Workflow2> aggregationResults = this.mongoTemplate.aggregate(pipeline, test.barry.models.Workflow.class, test.barry.models.Workflow2.class);
    java.util.List<test.barry.models.Workflow2> listResults = aggregationResults.getMappedResults();
    System.out.println(listResults.size());

文件 src/main/java/test.barry/SpringConfiguration.java

package test.barry;

@org.springframework.context.annotation.Configuration
@org.springframework.context.annotation.PropertySource("classpath:/application.properties")
public class SpringConfiguration 

    @org.springframework.beans.factory.annotation.Autowired
    org.springframework.core.env.Environment env;

    @org.springframework.context.annotation.Bean
     public com.mongodb.client.MongoClient mongoClient() 
         String uri = env.getProperty("spring.data.mongodb.uri");
         return com.mongodb.client.MongoClients.create(uri);
     
    @org.springframework.context.annotation.Bean
    public org.springframework.data.mongodb.MongoDatabaseFactory mongoDatabaseFactory() 
        String uri = env.getProperty("spring.data.mongodb.uri");
        String database = env.getProperty("spring.data.mongodb.database");
        return new org.springframework.data.mongodb.core.SimpleMongoClientDatabaseFactory(com.mongodb.client.MongoClients.create(uri), database);
    

    @org.springframework.context.annotation.Bean
    public org.springframework.data.mongodb.core.MongoTemplate mongoTemplate() throws Exception 
        return new org.springframework.data.mongodb.core.MongoTemplate(mongoClient(), env.getProperty("spring.data.mongodb.database"));

文件 src/main/java/test.barry/models/Workflow.java

package test.barry.models;

@org.springframework.data.mongodb.core.mapping.Document(collection = "Workflow")
public class Workflow

    @org.springframework.data.annotation.Id
    public int id;

    public String name;
    public String status;
    public java.util.Date createdDate;
    public java.util.List<Task> tasks;

    public Workflow() 
        this.tasks = new java.util.ArrayList<Task>();
    

    public Workflow(String name, String status, java.util.Date createdDate) 
        this();
        this.name = name;
        this.status = status;
        this.createdDate = createdDate;
    

    @Override
    public String toString() 
        return String.format("Workflow[id=%s, name='%s', status='%s', createdDate='%s']", id, name, status, createdDate);

文件 src/main/java/test.barry/models/Workflow2.java

package test.barry.models;

@org.springframework.data.mongodb.core.mapping.Document(collection = "Workflow")
public class Workflow2

    @org.springframework.data.annotation.Id
    public int id;

    public String name;
    public String status;
    public java.util.Date createdDate;
    public Task tasks;

    public Workflow2() 
        this.tasks = new Task();
    

    public Workflow2(String name, String status, java.util.Date createdDate) 
        this();
        this.name = name;
        this.status = status;
        this.createdDate = createdDate;
    

    @Override
    public String toString() 
        return String.format("Workflow[id=%s, name='%s', status='%s', createdDate='%s']", id, name, status, createdDate);

文件 src/main/java/test.barry/models/Task.java

package test.barry.models;

public class Task

    public Task() 

    public Task(String taskId, String value) 
        this.taskId = taskId;
        this.value = value;
    

    public String taskId;
    public String value;

结论

在使用 MongoShell 时，我们看到创建了以下记录：

Enterprise replSet [primary] javaspringtestX> db.Workflow.find()
[
  
    _id: 1,
    name: 'workflow',
    status: 'started',
    createdDate: ISODate("2021-03-10T23:49:46.704Z"),
    tasks: [
       taskId: 'task1', value: 'new' ,
       taskId: 'task2', value: 'started' ,
       taskId: 'task3', value: 'completed' 
    ],
    _class: 'test.barry.models.Workflow'
  ,
  
    _id: 2,
    name: 'workflow',
    status: 'started',
    createdDate: ISODate("2021-03-10T23:49:46.704Z"),
    tasks: [
       taskId: 'task1', value: 'new' ,
       taskId: 'task2', value: 'started' ,
       taskId: 'task3', value: 'completed' 
    ],
    _class: 'test.barry.models.Workflow'
  ,
  
    _id: 3,
    name: 'workflow',
    status: 'started',
    createdDate: ISODate("2021-03-10T23:49:46.704Z"),
    tasks: [
       taskId: 'task1', value: 'new' ,
       taskId: 'task2', value: 'started' ,
       taskId: 'task3', value: 'completed' 
    ],
    _class: 'test.barry.models.Workflow'
  
]

要查看聚合结果，我们必须使用调试器。我正在使用 IntelliJ IDEA 进行调试，并将结果显示在 Workflow2 类型的列表中。不知道如何在这里展示它们。我的测试表明这可以按照我的理解工作。请评估并让我知道这是否需要调整...

顺便说一下，分页的概念最适合由您的应用程序而不是数据库来管理。在实践中，您可能会发现 skip() 和 limit() 的用法，但是对于具有许多页面的大型数据集，您可能会发现对下一页的请求会导致性能问题，因为每次它们必须识别所有文档然后确定要跳过的文档。最好跟踪前一页上显示的范围，然后仅重新查询下一页上的记录。即，限制结果集以获得更好的性能。

编辑 - 2021-12-09 在查看保存的数据后，它会显示奇怪的日期。显然，不推荐使用 java.util.Date myDate = java.util.Date(2021, 2, 10); 会创建无效日期。为此我添加了java.util.Calendar calendar = java.util.Calendar.getInstance();

【讨论】：

哇，感谢您如此详尽的回复。我明天试试。我想我很接近，因为您的聚合代码几乎就是我尝试过的。再次感谢！

以上是关于如何使用 mongo 搜索集合并返回子文档列表（Spring-data-mongo）的主要内容，如果未能解决你的问题，请参考以下文章