Neo4j 插入需要更多时间

Posted 2023-03-31

技术标签:

【中文标题】Neo4j 插入需要更多时间【英文标题】：Neo4j Insertion taking more time 【发布时间】：2014-05-19 05:55:23 【问题描述】：

我们有大约 50,000 个节点和 80,00,000（800 万）条边。

我们正在尝试使用 java 将这些数据插入到 neo4j（嵌入式图形数据库）中。但这需要很多时间（几个小时）。

我们想知道在任何地方插入是否出错。我们正在为节点使用自动索引。下面给出完整的实现。

请让我知道发生了什么问题以及与以下代码有关的更改。

public static void main(String[] args)


        // TODO Auto-generated method stub
        nodeGraph obj = new nodeGraph();
        obj.createDB();
        System.out.println("Graph Database Initialised");
        obj.parseNodesCsv();
        System.out.println("Creating relationships in process....");
        obj.parseEdgesCsv();
        obj.shutDown();



public void createDB() 

        graphDb = new GraphDatabaseFactory().newEmbeddedDatabaseBuilder( DB_PATH ).
        setConfig( GraphDatabaseSettings.node_keys_indexable, "id,name" ).
        setConfig( GraphDatabaseSettings.relationship_keys_indexable, "rel" ).
        setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
        setConfig( GraphDatabaseSettings.relationship_auto_indexing, "true" ).
        newGraphDatabase();             
        registerShutdownHook(graphDb);
        // Get the Node AutoIndexer, set nodeProp1 and nodeProp2 as auto
        // indexed.
        AutoIndexer<Node> nodeAutoIndexer = graphDb.index().getNodeAutoIndexer();
        nodeAutoIndexer.startAutoIndexingProperty( "id" );
        nodeAutoIndexer.startAutoIndexingProperty( "name" );

        // Get the Relationship AutoIndexer
        //AutoIndexer<Relationship> relAutoIndexer = graphDb.index().getRelationshipAutoIndexer();
        //relAutoIndexer.startAutoIndexingProperty( "relProp1" );

        // None of the AutoIndexers are enabled so far. Do that now
        nodeAutoIndexer.setEnabled( true );
        //relAutoIndexer.setEnabled( true );


public void parseNodesCsv()

        try 
        
            CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/nodesNeo.csv"),'  ','"');
            String rows[]=null;
            while ((rows=reader.readNext())!=null) 
            
                createNode(rows);
                System.out.println(rows[0]);

            
            reader.close();
         


        catch (FileNotFoundException e) 
        
            // TODO Auto-generated catch block
            System.err.println("Error: cannot find datasource.");
            e.printStackTrace();
         
        catch (IOException e) 
        
            // TODO Auto-generated catch block
            e.printStackTrace();
         


public void parseEdgesCsv()

        try 
        
            CSVReader reader= new CSVReader(new FileReader("/home/sandy/Desktop/workspacesh/importToNeo4j/edgesNeo.csv"),',','"');
            String rows[]=null; 
            while ((rows=reader.readNext())!=null) 
            
                createRelationshipsUsingIndexes(rows);

            
            reader.close();
           


        catch (FileNotFoundException e) 
        
            // TODO Auto-generated catch block
            System.err.println("Error: cannot find datasource.");
            e.printStackTrace();
         
        catch (IOException e) 
        
            // TODO Auto-generated catch block
            e.printStackTrace();
         



public void createNode(String[] rows)

         Transaction tx = graphDb.beginTx();
         try 
               
                firstNode = graphDb.createNode(DynamicLabel.label( rows[2] ));
                firstNode.setProperty("id",rows[0] );
                firstNode.setProperty("name",rows[1] );
                System.out.println(firstNode.getProperty("id"));
                tx.success();
             
            finally
            
                tx.finish();
            



public void createRelationshipsUsingIndexes(String rows[])

        Transaction tx = graphDb.beginTx();
        try
        
            ReadableIndex<Node> autoNodeIndex = graphDb.index().getNodeAutoIndexer().getAutoIndex();
            // node1 and node2 both had auto indexed properties, get them
            firstNode=autoNodeIndex.get( "id", rows[0] ).getSingle();
            secondNode=autoNodeIndex.get( "id", rows[1] ).getSingle();

            relationship = firstNode.createRelationshipTo( secondNode, RelTypes.CO_OCCURRED );
            relationship.setProperty( "frequency", rows[2] );
            relationship.setProperty( "generatability_score", rows[3] );
            tx.success();   

        
        finally
        
              tx.finish();

【问题讨论】：

【参考方案1】：

您用于导入的内存配置（堆）是什么？您在什么操作系统上运行（我假设是一些 Linux）以及您使用的 Neo4j 版本是什么？

我建议升级到 Neo4j 2.0.3 的最新稳定版本

您的导入存在一些问题：

在 FileReader 周围使用 BufferedReader 以获得更好的 CSV 读取性能。

使用my batch-importer 进行快速初始导入会更有意义

【讨论】：

我们使用的是 Ubuntu 12.04，Neo4j 2.0.1 社区版。我们会做出相应的改变。 neo4j 服务器的 mmio 设置是什么？这里是相关设置：docs.neo4j.org/chunked/stable/configuration-io-examples.html 除非 CSVReader 在内部使用缓冲流包装器，否则您也应该考虑添加它。如果操作正确，指示的数据大小不应超过几秒钟的时间。嗨 Michael 和 Rickard，我听从了您的建议，现在程序在插入时运行良好（不到 10 分钟）。现在我需要从图形数据库中获取子图。那么我必须选择哪些索引？（而且我不能妥协性能！）。

以上是关于Neo4j 插入需要更多时间的主要内容，如果未能解决你的问题，请参考以下文章