Neo4j Cypher 查询结构和性能优化
Posted
技术标签:
【中文标题】Neo4j Cypher 查询结构和性能优化【英文标题】:Neo4j Cypher query structure and performance optimization 【发布时间】:2017-05-06 20:03:22 【问题描述】:我创建了一个 Cypher 查询动态构建器。对于复杂的情况,此构建器会产生相当大的查询,例如:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = decisionId
MATCH (childD)<-[:SET_FOR]-(filterValue415431:Value)-[:SET_ON]->(filterCharacteristic415431:Characteristic)
WHERE id(filterCharacteristic415431) = 415431
WITH filterValue415431, childD, ru, u
WHERE (filterValue4154311 IN filterValue415431.value )
OR (filterValue4154312 IN filterValue415431.value )
OR (filterValue4154313 IN filterValue415431.value )
OR (filterValue4154314 IN filterValue415431.value )
OR (filterValue4154315 IN filterValue415431.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415441:Value)-[:SET_ON]->(filterCharacteristic415441:Characteristic)
WHERE id(filterCharacteristic415441) = 415441
WITH filterValue415441, childD, ru, u
WHERE (filterValue4154416 IN filterValue415441.value )
OR (filterValue4154417 IN filterValue415441.value )
OR (filterValue4154418 IN filterValue415441.value )
OR (filterValue4154419 IN filterValue415441.value )
OR (filterValue41544110 IN filterValue415441.value )
OR (filterValue41544111 IN filterValue415441.value )
OR (filterValue41544112 IN filterValue415441.value )
OR (filterValue41544113 IN filterValue415441.value )
OR (filterValue41544114 IN filterValue415441.value )
OR (filterValue41544115 IN filterValue415441.value )
OR (filterValue41544116 IN filterValue415441.value )
OR (filterValue41544117 IN filterValue415441.value )
MATCH (childD)<-[:SET_FOR]-(filterValue416273:Value)-[:SET_ON]->(filterCharacteristic416273:Characteristic)
WHERE id(filterCharacteristic416273) = 416273
WITH filterValue416273, childD, ru, u
WHERE (filterValue416273.value >= filterValue41627318)
AND (filterValue416273.value <= filterValue41627319)
MATCH (childD)<-[:SET_FOR]-(filterValue417410:Value)-[:SET_ON]->(filterCharacteristic417410:Characteristic)
WHERE id(filterCharacteristic417410) = 417410
WITH filterValue417410, childD, ru, u
MATCH (childD)<-[:SET_FOR]-(filterValue416423:Value)-[:SET_ON]->(filterCharacteristic416423:Characteristic)
WHERE id(filterCharacteristic416423) = 416423
WITH filterValue416423, childD, ru, u
WHERE (filterValue41642320 IN filterValue416423.value )
OR (filterValue41642321 IN filterValue416423.value )
OR (filterValue41642322 IN filterValue416423.value )
OR (filterValue41642323 IN filterValue416423.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415673:Value)-[:SET_ON]->(filterCharacteristic415673:Characteristic)
WHERE id(filterCharacteristic415673) = 415673
WITH filterValue415673, childD, ru, u
WHERE (filterValue41567324 IN filterValue415673.value )
OR (filterValue41567325 IN filterValue415673.value )
OR (filterValue41567326 IN filterValue415673.value )
OR (filterValue41567327 IN filterValue415673.value )
OR (filterValue41567328 IN filterValue415673.value )
OR (filterValue41567329 IN filterValue415673.value )
OR (filterValue41567330 IN filterValue415673.value )
OR (filterValue41567331 IN filterValue415673.value )
OR (filterValue41567332 IN filterValue415673.value )
OR (filterValue41567333 IN filterValue415673.value )
OR (filterValue41567334 IN filterValue415673.value )
OR (filterValue41567335 IN filterValue415673.value )
OR (filterValue41567336 IN filterValue415673.value )
OR (filterValue41567337 IN filterValue415673.value )
OR (filterValue41567338 IN filterValue415673.value )
OR (filterValue41567339 IN filterValue415673.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN criteriaIds
WITH childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments) ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes) ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD) |
characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode ] AS valuedCharacteristics
现在我对表演不太满意。例如调用这个查询需要 ~500ms
能否请您看一下,看看是否有机会改进此查询?
更新
这是一个几乎相同的查询,但参数不同:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)-[ru:CREATED_BY]->(u:User)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423:Value)-[:SET_ON]->(filterCharacteristic416423:Characteristic)
WHERE id(filterCharacteristic416423) = 416423
WITH filterValue416423, childD, ru, u
WHERE ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
MATCH (childD)<-[:SET_FOR]-(filterValue416273:Value)-[:SET_ON]->(filterCharacteristic416273:Characteristic)
WHERE id(filterCharacteristic416273) = 416273 WITH filterValue416273, childD, ru, u
WHERE (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
MATCH (childD)<-[:SET_FOR]-(filterValue415431:Value)-[:SET_ON]->(filterCharacteristic415431:Characteristic)
WHERE id(filterCharacteristic415431) = 415431 WITH filterValue415431, childD, ru, u
WHERE ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415441:Value)-[:SET_ON]->(filterCharacteristic415441:Characteristic)
WHERE id(filterCharacteristic415441) = 415441 WITH filterValue415441, childD, ru, u
WHERE ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments) ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes) ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1:Value)-[:SET_FOR]->(childD) |
characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode ] AS valuedCharacteristics
Cypher 版本:CYPHER 3.1,规划器:COST,运行时:INTERPRETED。 390 毫秒内总共 646192 次 db 命中。
更新
这是:schema
的输出
Indexes
ON :Characteristic(lowerName) ONLINE
ON :CharacteristicGroup(lowerName) ONLINE
ON :Criterion(lowerName) ONLINE
ON :CriterionGroup(lowerName) ONLINE
ON :Decision(lowerName) ONLINE
ON :FlagType(name) ONLINE (for uniqueness constraint)
ON :HistoryValue(originalValue) ONLINE
ON :Permission(code) ONLINE (for uniqueness constraint)
ON :Role(name) ONLINE (for uniqueness constraint)
ON :User(email) ONLINE (for uniqueness constraint)
ON :User(username) ONLINE (for uniqueness constraint)
ON :Value(value) ONLINE
Constraints
ON ( flagtype:FlagType ) ASSERT flagtype.name IS UNIQUE
ON ( permission:Permission ) ASSERT permission.code IS UNIQUE
ON ( role:Role ) ASSERT role.name IS UNIQUE
ON ( user:User ) ASSERT user.email IS UNIQUE
ON ( user:User ) ASSERT user.username IS UNIQUE
更新
我已经按照以下答案的建议优化了查询:
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423)-[:SET_ON]->(filterCharacteristic416423)
WHERE id(filterCharacteristic416423) = 416423
WITH DISTINCT filterValue416423, childD
WHERE ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
MATCH (childD)<-[:SET_FOR]-(filterValue416273)-[:SET_ON]->(filterCharacteristic416273)
WHERE id(filterCharacteristic416273) = 416273
WITH DISTINCT childD, filterValue416273
WHERE (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
MATCH (childD)<-[:SET_FOR]-(filterValue415431)-[:SET_ON]->(filterCharacteristic415431)
WHERE id(filterCharacteristic415431) = 415431
WITH DISTINCT childD, filterValue415431
WHERE ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
MATCH (childD)<-[:SET_FOR]-(filterValue415441)-[:SET_ON]->(filterCharacteristic415441)
WHERE id(filterCharacteristic415441) = 415441
WITH DISTINCT childD, filterValue415441
WHERE ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH DISTINCT * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH DISTINCT childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH DISTINCT ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments) ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes) ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1)-[:SET_FOR]->(childD) |
characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode ] AS valuedCharacteristics
PROFILE 输出:
使用DISTINCT childD
时,查询运行速度很慢,没有更好,但仍远非完美
再试一次
PROFILE MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423)-[:SET_ON]->(filterCharacteristic416423)
USING JOIN ON childD
WHERE id(filterCharacteristic416423) = 416423
AND ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue416273)-[:SET_ON]->(filterCharacteristic416273)
USING JOIN ON childD
WHERE id(filterCharacteristic416273) = 416273 AND (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415431)-[:SET_ON]->(filterCharacteristic415431)
USING JOIN ON childD
WHERE id(filterCharacteristic415431) = 415431
AND ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415441)-[:SET_ON]->(filterCharacteristic415441)
USING JOIN ON childD
WHERE id(filterCharacteristic415441) = 415441
AND ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH DISTINCT * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH DISTINCT childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH DISTINCT ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN childD
【问题讨论】:
对您的查询运行配置文件并发布响应以获得更好的答案 在浏览器中执行时可以设置查询参数。尝试:help param
和:help params
获取使用信息。
请花一些时间来格式化您的查询。所有这些都在一行中是不可读的。
当然,已格式化查询。顺便说一句 - 是否有任何工具能够自动格式化 Cypher 查询?
你有数据供其他人运行吗?
【参考方案1】:
您的查询的主要问题是,您基本上是在进行大量检查,而行运行异常。因此,这里有一些技巧可以减少您在每个 MATCH 中生成的行数。
1) 除非您需要重复,否则请使用 WITH DISTINCT 而不是只使用 WITH。 WITH 可以创建重复的行(因为您只切断了一列),并且您处理的每个重复行都浪费了时间和额外的数据库命中。 (即,您删除的每个过滤器列都会添加重复的行)
2) :Value.value 被重载。它没有语义意义,甚至不保证该值是任何类型的。这意味着每个 :Value 检查都必须出去并触摸一堆与您的搜索无关的 :Value 节点。因此,随着附加 :Value 节点的数量增加,找到正确节点的成本就越高(如果可以对其进行索引,则成本会更低,这样它就可以找到正确的 :Value,并查看它连接到什么。如果您无法更改正在使用的架构,这将无济于事,而架构是指您的数据/关系的设置方式。
3) 只检查您需要检查的内容。说 (a:A)-[:TO]->(b:B) 似乎更有效,但如果所有 [:TO] 都从 :A 到 :B,Neo4j 现在必须验证第一个节点是一个:A,第二个节点是一个:B。 Cypher 不知道什么是隐式正确的,因此它必须进行检查,但是这些冗余检查中的每一个都必须出去并针对每一行访问数据库。所以最好说 (a)-[:TO]->(b)。
4) 限制变量范围。在这里,您在开头匹配 -[ru:CREATED_BY]->(u:User) 但直到最后才使用它,没有过滤器。这会将每个决定中的行数乘以 -[ru:CREATED_BY]->(u:User) 的数量,所有必须在进一步的匹配中检查。除非 -[ru:CREATED_BY]->(u:User) 以某种方式极大地限制了匹配的决策(或者每个决策只能有一个),否则在最后匹配此支持信息。
5) 按从强到弱的顺序排列您的过滤器(如果可以的话)。尽可能早地削减行数。
6) 最小化行数的技巧。拉出的每一行都会使查询中的以下步骤变得更加困难,因此尽量减少查询中的行。如果您使用 OR 来组合不相关但相似的列查询(例如所有具有条件 A 的组织或具有条件 B 的组织)并且这两个查询的工作只会使另一半的事情变得更加昂贵,那么使用 UNION 可能会更好组合更小、更快的查询的结果(并且 UNION 可以并行运行直到合并结果)。请注意,像 [1,2,3] 中的 WHERE org.id 之类的简单查询仍然比 UNION 快,因为所有工作都可以在一次查找中完成。
除了联合之外,如果您正在收集不过滤的节点,您可以使用 collect(column) 将“重复项”减少到 1 行,然后将 UNWIND(列)用作末尾的列查询以恢复您的行! (此处的列指变量名)
7) 在 1 个节点上进行大量过滤? Cypher 有USING 的提示!提示USING JOIN ON column
告诉 Cypher,用更多的起始叶子并加入它们可能会更有效。因此,在每个匹配项上使用USING JOIN ON childD
将告诉 Cypher 并行执行所有过滤器,并使用所有过滤器的重叠行。请注意,USING 只是您告诉 Cypher “相信我,如果我们尝试这样做应该会更快”,如果您错了,这实际上会使查询变得更糟。 (虽然对于使大型查询更加并行,但使用 JOIN 应该很有用)
更新: 首先,关于 node.id = "constant" AND node.value = "constant" OR node.id = "constant2" AND node.value = "constant2" vs node.value = map[node.id] 的注释。第一个查询能够对节点查找进行节点过滤,而后者必须过滤所有已经查找过的节点。如果没有对该查找进行先前的过滤,这意味着地图必须拉入所有节点。虽然地图提供了某种程度的(有争议的)简单性/灵活性,但它是过滤节点效率最低的方法之一。
其次,您现在查询的最大问题是 :Value 超级重载,您无法通过 ID 找到它。 :Value 应该是一个关系,或者有一个索引 ID 字段,这样您就不必触摸 ALL 。我认为使用 Join 提示至少会使 SET_FOR 具有更高的优先级,这似乎是两者中更有效的。
这是我尝试更有效地重写 PROFILE 查询。 (v1)
MATCH (parentD)-[:CONTAINS]->(childD:Decision)
WHERE id(parentD) = 415406
MATCH (childD)<-[:SET_FOR]-(filterValue416423)-[:SET_ON]->(filterCharacteristic416423)
USING JOIN ON childD
WHERE id(filterCharacteristic416423) = 416423
WHERE ('Adobe RGB' IN filterValue416423.value ) OR ('ECI RGB' IN filterValue416423.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue416273)-[:SET_ON]->(filterCharacteristic416273)
USING JOIN ON childD
WHERE id(filterCharacteristic416273) = 416273 AND (filterValue416273.value >= 4) AND (filterValue416273.value <= 53)
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415431)-[:SET_ON]->(filterCharacteristic415431)
USING JOIN ON childD
WHERE id(filterCharacteristic415431) = 415431
WHERE ('Compact' IN filterValue415431.value )
OR ('Compact SLR' IN filterValue415431.value )
OR ('Large SLR' IN filterValue415431.value )
OR ('Rangefinder-style mirrorless' IN filterValue415431.value )
OR ('SLR-like (bridge)' IN filterValue415431.value )
WITH DISTINCT childD
MATCH (childD)<-[:SET_FOR]-(filterValue415441)-[:SET_ON]->(filterCharacteristic415441)
USING JOIN ON childD
WHERE id(filterCharacteristic415441) = 415441
WHERE ('Brass' IN filterValue415441.value )
OR ('Carbon fiber' IN filterValue415441.value )
OPTIONAL MATCH (childD)<-[:VOTED_FOR]-(vg:VoteGroup)-[:VOTED_ON]->(c:Criterion)
WHERE id(c) IN [415414, 415415, 415412, 415426, 415411]
WITH DISTINCT * MATCH (childD)-[ru:CREATED_BY]->(u:User)
WITH DISTINCT childD, ru, u, vg.avgVotesWeight as weight, vg.totalVotes as totalVotes
WITH DISTINCT ru, u, childD , toFloat(sum(weight)) as weight, toInt(sum(totalVotes)) as totalVotes
ORDER BY weight DESC
SKIP 0 LIMIT 10
RETURN ru, u, childD AS decision, weight, totalVotes,
[ (parentD)<-[:DEFINED_BY]-(entity)<-[:COMMENTED_ON]-(comg:CommentGroup)-[:COMMENTED_FOR]->(childD) |
entityId: id(entity), types: labels(entity), totalComments: toInt(comg.totalComments) ] AS commentGroups,
[ (parentD)<-[:DEFINED_BY]-(c1:Criterion)<-[:VOTED_ON]-(vg1:VoteGroup)-[:VOTED_FOR]->(childD) |
criterionId: id(c1), weight: vg1.avgVotesWeight, totalVotes: toInt(vg1.totalVotes) ] AS weightedCriteria,
[ (parentD)<-[:DEFINED_BY]-(ch1:Characteristic)<-[:SET_ON]-(v1)-[:SET_FOR]->(childD) |
characteristicId: id(ch1), value: v1.value, valueType: ch1.valueType, visualMode: ch1.visualMode ] AS valuedCharacteristics
【讨论】:
感谢您的回答。我已经更新了问题并添加了重构查询。请注意,使用“DISTINCT childD”的查询运行速度很慢,没有 - 好得多,但仍远非完美。有什么可以改进的吗?此外,与此同时,我正在尝试基于“地图”方法实施解决方案,但现在没有运气 - ***.com/questions/43908077/… @alexanoid (1) 将所有 WITH 更改为 WITH DISTINCT。它足够便宜,安全总比后悔好。此外,新的 PROFILE 图像有助于查看查询的哪些部分现在是瓶颈/浪费时间 我更新了查询并添加了新的配置文件输出信息 @alexanoid 我想我现在已经得到了所有东西......我认为任何进一步的优化都必须对数据而不是查询进行。但是让我看看带有 JOIN 的查询的概况。 谢谢。我已经用个人资料信息更新了我的问题。还请考虑到我对您的查询进行了一些修改,因为原始查询无法编译。以上是关于Neo4j Cypher 查询结构和性能优化的主要内容,如果未能解决你的问题,请参考以下文章