如何优化 Neo4J Cypher 查询?

Posted

技术标签:

【中文标题】如何优化 Neo4J Cypher 查询?【英文标题】:How to optimize a Neo4J Cypher query? 【发布时间】:2018-03-02 09:47:33 【问题描述】:

我有一个将文本转换为网络的应用程序,因此当添加一个句子时,每个单词都是一个节点,每个单词的共现都是它们之间的连接。此信息对于更好地理解以下问题很重要。

为了将每个句子添加到 Neo4J 数据库中,我在 Neo4J 中有以下 Cypher 查询,根据我的数据结构,它首先匹配添加节点的user,然后匹配context(或列表)在哪里创建语句,将其链接到用户,将语句链接到用户和上下文,然后在每个添加的节点(带有属性)、语句、它们的创建位置和上下文之间创建连接(列表) 制作它们的地方。

问题是这个查询比句子本身长了大约 100,所以如果一个文本是 400Bytes,那么查询大约是 40K。当我想添加长文本时,Neo4J 开始很慢。

因此——我的问题是:如何以最佳方式优化此查询?您是否建议改为进行一组交易?

例如,我可以将每个长查询分成许多部分,然后并发发送一些事务以节省时间吗?

我说的是大约 100K 长的文本,也许更长。这意味着总请求的长度约为 10Mb。

MATCH (u:User uid: "6e228580-1cb3-11e8-8271-891867c15336") 
MERGE (c_list:Context name:"list",by:"6e228580-1cb3-11e8-8271-891867c15336",
uid:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47") 
ON CREATE SET c_list.timestamp="15199833288930000" 
MERGE (c_list)-[:BYtimestamp:"15199833288930000"]->(u) 
CREATE (s:Statement name:"#apple #orange #fruit", 
text:"apples and oranges are fruits", 
uid:"0b56a800-1dfd-11e8-802e-b5cbdf950c47", timestamp:"15199833288930000") 
CREATE (s)-[:BY context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
timestamp:"15199833288930000"]->(u) 
CREATE (s)-[:IN user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930000"]->(c_list) 
MERGE (cc_apple:Concept name:"apple") 
ON CREATE SET cc_apple.timestamp="15199833288930000", cc_apple.uid="0b56a801-1dfd-11e8-802e-b5cbdf950c47" 
MERGE (cc_orange:Concept name:"orange") 
ON CREATE SET cc_orange.timestamp="15199833288930000", cc_orange.uid="0b56cf10-1dfd-11e8-802e-b5cbdf950c47" 
MERGE (cc_fruit:Concept name:"fruit") 
ON CREATE SET cc_fruit.timestamp="15199833288930002", cc_fruit.uid="0b56cf13-1dfd-11e8-802e-b5cbdf950c47" 
CREATE (cc_apple)-[:BY context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",timestamp:"15199833288930000",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"]->(u) 
CREATE (cc_apple)-[:OF context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000"]->(s)  
CREATE (cc_apple)-[:AT user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000",
context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"]->(c_list) 
CREATE (cc_apple)-[:TO context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930000",uid:"0b56cf11-1dfd-11e8-802e-b5cbdf950c47",gapscan:"2",weight:"3"]->(cc_orange) 
CREATE (cc_orange)-[:BY context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",timestamp:"15199833288930000",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"]->(u) 
CREATE (cc_orange)-[:OF context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000"]->(s) 
CREATE (cc_orange)-[:AT user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930000",
context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"]->(c_list) 
CREATE (cc_orange)-[:TO context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",uid:"0b56cf14-1dfd-11e8-802e-b5cbdf950c47",gapscan:"2",weight:"3"]->(cc_fruit) 
CREATE (cc_apple)-[:TO context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",uid:"0b56cf16-1dfd-11e8-802e-b5cbdf950c47",gapscan:"4",weight:"2"]->(cc_fruit) 
CREATE (cc_fruit)-[:BY context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
timestamp:"15199833288930002",statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"]->(u) 
CREATE (cc_fruit)-[:OF context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",user:"6e228580-1cb3-11e8-8271-891867c15336",timestamp:"15199833288930002"]->(s) 
CREATE (cc_fruit)-[:AT user:"6e228580-1cb3-11e8-8271-891867c15336",
timestamp:"15199833288930002",context:"0b4fa320-1dfd-11e8-802e-b5cbdf950c47",
statement:"0b56a800-1dfd-11e8-802e-b5cbdf950c47"]->(c_list)  
RETURN s.uid;

【问题讨论】:

这看起来很手动。您是否有任何方法可以拆分文本字符串并一次处理所有标记?另外,所有这些值是从哪里来的?很难说出要输入查询的含义,自动生成的含义(uuid?)以及硬编码的含义。如果可能,您应该真正使用参数和变量来减少冗余。 是的,我可以拆分文本字符串,但我不太明白您一次处理所有令牌的意思...这些值只是唯一的连接,UUIDS 是自动生成的 ID避免它们重合。使用参数和变量是什么意思?查询本身当然是由脚本 (node.js) 生成的,但这就是它发送到 Neo4J 时的样子... 这是parameters 的文档。变量的工作方式类似。我指的是在查询中为字符串和值使用参数和/或变量,因此您不必一遍又一遍地重复相同的 uuid,例如,您只需使用参数化值。 要强调的一件主要事情是,随着输入的变化而发生变化(尤其是长度急剧变化)的查询是警告您的方法可能存在缺陷并且您的查询需要修复,因此您没有明确地处理每个查询的一部分。如果你注意stdob--的查询,不管句子有多大,它都是不变的。 【参考方案1】:

1) 使用输入parameters:

var params = 
    userId: "6e228580-1cb3-11e8-8271-891867c15336",
    contextName: "list",
    time: "15199833288930000",
    statementName: "#apple #orange #fruit",
    statementText: "apples and oranges are fruits",
    concepts: ["apple", "orange", "fruit"],
    conceptsRelations: [
        from: "apple",  to: "orange", gapscan: 2, weight: 3,
        from: "orange", to: "fruit",  gapscan: 2, weight: 3,
        from: "apple",  to: "fruit",  gapscan: 4, weight: 2
    ]

session.run(cypherQuery, params).then...

2) 使用APOC library在数据库端生成唯一标识符:apoc.create.uuid()

3) 使用循环(foreach 和 unwind)进行重复操作:

MATCH (u:User uid: $userId)
MERGE (c_list:Context name: $contextName, by: $userId)
    ON CREATE SET c_list.timestamp = $time,
                  c_list.uid = apoc.create.uuid()
MERGE (c_list)-[:BYtimestamp: $time]->(u)

CREATE (s:Statement name: $statementName, 
                     text: $statementText, uid:apoc.create.uuid(), timestamp: $time)
CREATE (s)-[:BY context: c_list.uid, timestamp: $time]->(u)
CREATE (s)-[:IN user: u.uid, timestamp: $time]->(c_list)

FOREACH (conceptName in $concepts|
    MERGE (concept:Concept name: conceptName)
        ON CREATE SET concept.timestamp = $time,
                      concept.uid = apoc.create.uuid()
    CREATE (concept)-[:BY context: c_list.uid, timestamp: $time, statement: s.uid]->(u)
    CREATE (concept)-[:OF context: c_list.uid, user: u.uid, timestamp: $time]->(s)
    CREATE (concept)-[:AT user: u.uid, timestamp: $time, 
                           context: c_list.uid, statement: s.uid]->(c_list)
)

WITH u, c_list, s

UNWIND $conceptsRelations as conceptsRelation
  MATCH (c_from:Concept name: conceptsRelation.from)
  MATCH (c_to:Concept name: conceptsRelation.to)
  CREATE (c_from)-[:TO context: c_list.uid, statement: s.uid, user: u.uid,
                        timestamp: $time, uid: apoc.create.uuid(), 
                        gapscan: conceptsRelation.gapscan, 
                        weight: conceptsRelation.weight]->(c_to)
RETURN distinct s.uid;

【讨论】:

以上是关于如何优化 Neo4J Cypher 查询?的主要内容,如果未能解决你的问题,请参考以下文章

Neo4j Cypher 查询结构和性能优化

Neo4j Cypher 复杂查询优化

如何使用 Neo4J 的 Cypher 查询返回关系类型?

如何往neo4j中批量插入cypher语句

如何使用Cypher只在Neo4j中获得朋友的朋友

Neo4j 第三篇:Cypher查询入门