Neo4j:标签与索引属性?

Posted

技术标签:

【中文标题】Neo4j:标签与索引属性?【英文标题】:Neo4j: label vs. indexed property? 【发布时间】:2015-03-13 10:18:00 【问题描述】:

假设你是 Twitter,并且:

您有(:User)(:Tweet) 节点; 推文可以被标记;和 您想要查询列表当前等待审核的已标记推文。

您可以为这些推文添加 标签,例如:AwaitingModeration,或者添加 property 并将其编入索引,例如isAwaitingModeration = true|false.

一种选择天生就比另一种更好吗?

我知道最好的答案可能是同时尝试和负载测试:),但是 Neo4j 的实施 POV 中是否有任何东西可以使一个选项更健壮或更适合这种查询?

这是否取决于任何特定时刻处于此状态的推文量?如果是 10 年代和 1000 年代,那有什么区别吗?

我的印象是标签更适合大量节点,而索引属性更适合较小量(理想情况下是唯一节点),但我不确定这是否真的。

谢谢!

【问题讨论】:

我真的不知道,但我认为标签会更有效。如果您使用标签,那么您可以排除所有 (:Tweet) 节点,甚至不匹配它们。如果您在 (:Tweet) 节点上使用属性方法,那么您的匹配仍将在匹配中包含 Tweet 标签。在关系或目录世界中,我认为您不会索引属性值,因为它的选择性很低。不过,我有兴趣看到答案。 【参考方案1】:

更新:跟进blog post已发布。

这是我们为客户建模数据集时的常见问题,也是 Active/NonActive 实体的典型用例。

这是关于我对 Neo4j2.1.6 的体验的一些反馈:

第 1 点。在标签或索引属性上匹配并返回节点之间的数据库访问不会有差异

第2点。当这些节点位于模式的末尾时会遇到差异,例如

MATCH (n:User id:1)
WITH n
MATCH (n)-[:WRITTEN]->(post:Post)
WHERE post.published = true
RETURN n, collect(post) as posts;

-

PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost)
> WHERE post.active = true
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com" | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                                      Other |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                                      keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                                      posts |
|      ColumnFilter(1) |    1 |      0 |                      |                                           keep columns n,   AGGREGATION153 |
|     EagerAggregation |    1 |      0 |                      |                                                                          n |
|               Filter |    1 |      3 |                      | (hasLabel(post:BlogPost(1)) AND Property(post,active(8)) ==   AUTOBOOL1) |
| SimplePatternMatcher |    1 |     12 | n, post,   UNNAMED84 |                                                                            |
|          SchemaIndex |    1 |      2 |                 n, n |                                                  AUTOSTRING0; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------------------------------------------------+

Total database accesses: 17

在这种情况下,Cypher 不会使用索引 :Post(published)

因此,在您有 ActivePost 标签的情况下,使用标签的性能更高,例如:

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com" | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION130 |
|     EagerAggregation |    1 |      0 |                      |                                n |
|               Filter |    1 |      1 |                      |     hasLabel(post:ActivePost(2)) |
| SimplePatternMatcher |    1 |      4 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |        AUTOSTRING0; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 7

第 3 点。始终使用标签表示肯定,这意味着对于上述情况,拥有 Draft 标签将迫使您执行以下查询:

MATCH (n:User id:1)
WITH n
MATCH (n)-[:POST]->(post:Post)
WHERE NOT post :Draft
RETURN n, collect(post) as posts;

这意味着 Cypher 将打开每个节点标签标题并对其进行过滤。

第 4 点。避免需要在多个标签上进行匹配

MATCH (n:User id:1)
WITH n
MATCH (n)-[:POST]->(post:Post:ActivePost)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:WRITTEN]->(post:BlogPost:ActivePost)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com" | 1     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +Filter
          |
          +SimplePatternMatcher
            |
            +SchemaIndex

+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                                                         Other |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |                                         keep columns n, posts |
|              Extract |    1 |      0 |                      |                                                         posts |
|      ColumnFilter(1) |    1 |      0 |                      |                              keep columns n,   AGGREGATION139 |
|     EagerAggregation |    1 |      0 |                      |                                                             n |
|               Filter |    1 |      2 |                      | (hasLabel(post:BlogPost(1)) AND hasLabel(post:ActivePost(2))) |
| SimplePatternMatcher |    1 |      8 | n, post,   UNNAMED84 |                                                               |
|          SchemaIndex |    1 |      2 |                 n, n |                                     AUTOSTRING0; :User(_id) |
+----------------------+------+--------+----------------------+---------------------------------------------------------------+

Total database accesses: 12

这将导致 Cypher 的过程与第 3 点相同。

第 5 点。如果可能,通过具有良好类型的命名关系来避免在标签上匹配

MATCH (n:User id:1)
WITH n
MATCH (n)-[:PUBLISHED]->(p)
RETURN n, collect(p) as posts

-

MATCH (n:User id:1)
WITH n
MATCH (n)-[:DRAFTED]->(post)
RETURN n, collect(post) as posts;

neo4j-sh (?)$ PROFILE MATCH (n:User) WHERE n._id = 'c084e0ca-22b6-35f8-a786-c07891f108fc'
> WITH n
> MATCH (n)-[:DRAFTED]->(post)
> RETURN n, size(collect(post)) as posts;
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| n                                                                                                                                                         | posts |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Node[118]_id:"c084e0ca-22b6-35f8-a786-c07891f108fc",login:"joy.wiza",password:"7425b990a544ae26ea764a4473c1863253240128",email:"hayes.shaina@yahoo.com" | 3     |
+-------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row

ColumnFilter(0)
  |
  +Extract
    |
    +ColumnFilter(1)
      |
      +EagerAggregation
        |
        +SimplePatternMatcher
          |
          +SchemaIndex

+----------------------+------+--------+----------------------+----------------------------------+
|             Operator | Rows | DbHits |          Identifiers |                            Other |
+----------------------+------+--------+----------------------+----------------------------------+
|      ColumnFilter(0) |    1 |      0 |                      |            keep columns n, posts |
|              Extract |    1 |      0 |                      |                            posts |
|      ColumnFilter(1) |    1 |      0 |                      | keep columns n,   AGGREGATION119 |
|     EagerAggregation |    1 |      0 |                      |                                n |
| SimplePatternMatcher |    3 |      0 | n, post,   UNNAMED84 |                                  |
|          SchemaIndex |    1 |      2 |                 n, n |        AUTOSTRING0; :User(_id) |
+----------------------+------+--------+----------------------+----------------------------------+

Total database accesses: 2

性能会更高,因为它会利用图表的所有力量,并且只遵循节点的关系,从而导致数据库访问不超过匹配用户节点,因此不会对标签进行过滤。

这是我的 0,02€

【讨论】:

优秀的答案,全面的。我学到了很多东西,我喜欢学习东西。在我看来,好的 neo4j 建模策略的一些原则仍在不断发展。如果社区可以在文档中收集更多这些建模原则,那就太好了,因为许多新用户都是图新手。 很荣幸收到您这样的评论。谢谢;-) 同意,感谢您的详尽回答。我有一些后续问题;太糟糕了,这个小小的评论框是它唯一的地方。第 2 点:我也不相信标签会使 遍历 更快。那么只有关系类型很重要,对吧?第 4 点:为什么指定更多标签会更慢? Cypher 还不够聪明,不能先使用基数较低的那个吗?一般来说,坚持原始 q 中的示例可能会很好:只是全局查找,不是遍历例如一个用户节点。所以我认为我对这种情况的看法是:两种选择是等价的? 对于第 2 点。问题是索引属性不会被使用,所以如果你在你的情况下只使用一个标签,他将对所有推文进行过滤。如果您使用专用标签,您将拥有由标签完成的内置过滤器。对于第 4 点:他将匹配标签并对另一个标签执行另一个过滤器,称为 hasLabel()。我将使用执行计划的结果编辑答案;-) 我添加了带有迷你数据集的 PROFILE 的结果,但它向您展示了性能方面的现实

以上是关于Neo4j:标签与索引属性?的主要内容,如果未能解决你的问题,请参考以下文章

Neo4j学习笔记——数据索引

图数据库-Neo4j使用

neo4j----创建索引

neo4j索引

在 neo4j 中,如何按日期索引并在日期范围内搜索?

Neo4j:Index索引