Neo4j:使用py2n从CSV文件创建关系非常慢

2024-05-18 06:11:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试使用py2neo flights建模将包含22列的CSV文件(25mb大小,150000行)加载到neo4j图中。在

cypher查询用于一个查询,包含节点和节点(机场、城市、航班和飞机)之间的关系创建。但是在运行代码时,即使使用周期性提交,也要花费很长时间。在

我不确定我编写的cypher查询是否经过优化,可能是速度慢的原因。 对于10000行,我花了大约10分钟来建立图表。。。 谁能帮帮我吗?代码如下:

def importFromCSVtoNeo(graph):
query = '''
    USING PERIODIC COMMIT 1000
    LOAD CSV WITH HEADERS FROM "file:///flights.csv" AS row FIELDTERMINATOR '\t' 
    WITH row 

    MERGE (c_departure:City {cityName: row.cityName_departure}) 
    MERGE (a_departure:Airport {airportName: row.airportName_departure}) 
    MERGE (f_segment1:Flight {airline: row.airline1}) 
    ON CREATE SET f_segment1.class = row.class1, 
                  f_segment1.outboundclassgroup = row.outboundclassgroup1 

    MERGE (a_departure)-[:IN]->(c_departure) 
    MERGE (c_departure)-[:HAS]->(a_departure) 
    MERGE (f_segment1)-[:FROM {departAt: row.outbounddeparttime}]->(a_departure) 

    MERGE (c_transfer:City {cityName: row.transferCityName}) 
    MERGE (a_transfer:Airport {airportName: row.airportName_transfer}) 
    MERGE (f_segment1)-[:TO_TRANSFER {transferArriveAt: row.transferArriveAt}]->(a_transfer) 
    MERGE (a_transfer)-[:IN]->(c_transfer) 
    MERGE (c_transfer)-[:HAS]->(a_transfer) 

    MERGE (c_arrival:City {cityName: row.cityName_arrival}) 
    MERGE (a_arrival:Airport {airportName: row.airportName_arrival}) 
    MERGE (f_segment2:Flight {airline: row.airline2}) 
    ON CREATE SET f_segment2.class = row.class2, 
                  f_segment2.outboundclassgroup = row.outboundclassgroup2 
    MERGE (f_segment2)-[:TO {arrivalAt: row.outboundarrivaltime}]->(a_arrival) 
    MERGE (f_segment2)-[:FROM_TRANSFER {transferDepartAt: row.transferDepartAt}]->(a_transfer) 
    MERGE (a_arrival)-[:IN]->(c_arrival) 
    MERGE (c_arrival)-[:HAS]->(a_arrival) 


    MERGE (p:Plane {saleprice: row.saleprice}) 
    ON CREATE SET p.depart = row.cityName_departure, 
                  p.destination = row.cityName_arrival, 
                  p.salechannel = row.salechannel, 
                  p.planeDuration = row.planeDuration 
    MERGE (p)-[:HAS_FLIGHTS]->(f_segment1) 
    MERGE (f_segment1)-[:WAIT_FOR {waitingTime: row.waitingTime}]->(f_segment2) 
    '''

graph.run(query)


if __name__ == '__main__':
    graph = Graph()
    importFromCSVtoNeo(graph)

我也尝试过在批处理模式下进行,但是性能没有得到更好的改善。。。 如有任何意见或建议,我将不胜感激。谢谢!!在


Tags: fromcityonmergegraphtransferrowhas
1条回答
网友
1楼 · 发布于 2024-05-18 06:11:58

在启动脚本之前,我将在nodes属性上使用索引,以便neo4j在使用MERGE时使用索引进行快速查找(因为它必须逐行匹配节点)。例如,对于第一个节点属性,我将使用:

CREATE INDEX ON :City(cityname)

等等。您可以在py2neo中直接将它们创建为单个run语句。在

相关问题 更多 >