为什么这个sql语句非常慢？

if upload_to_db == True: print(f'########################################WRITING TO TEMP TABLE: {symbol} #######################################################################') master_df.to_sql(name='tempTable', con=engine, if_exists='replace') with engine.begin() as cn: sql = """INSERT INTO instrumentsHistory (datetime, instrumentSymbol, observation, observationColName) SELECT t.datetime, t.instrumentSymbol, t.observation, t.observationColName FROM tempTable t WHERE NOT EXISTS (SELECT 1 FROM instrumentsHistory f WHERE t.datetime = f.datetime AND t.instrumentSymbol = f.instrumentSymbol AND t.observation = f.observation AND t.observationColName = f.observationColName)""" print(f'##############################################WRITING TO FINAL TABLE: {symbol} #################################################################') cn.execute(sql)

2条回答

网友

1楼 · 编辑于 2024-09-28 19:04:32

根据您的SQL数据库，您可以尝试使用类似INSERT INTO IGNORE（MySQL）或MERGE（例如在Oracle上）的内容，只有在不违反主键或唯一约束的情况下，才会执行插入。这将假定您正在检查的4列上存在这样的约束

在没有merge的情况下，可以尝试将以下索引添加到instrumentsHistory表中：

CREATE INDEX idx ON instrumentsHistory (datetime, instrumentSymbol, observation,
                                        observationColName);

此索引允许快速查找来自tempTable的每个传入记录，因此可能会加快插入过程

网友

2楼 · 编辑于 2024-09-28 19:04:32

此子查询

WHERE NOT EXISTS 
        (SELECT 1 FROM instrumentsHistory f
         WHERE t.datetime = f.datetime
         AND t.instrumentSymbol = f.instrumentSymbol
         AND t.observation = f.observation
         AND t.observationColName = f.observationColName)

必须检查表中的每一行，并匹配四列，直到找到匹配项。在最坏的情况下，没有匹配项，必须完成完整的表扫描。因此，查询的性能将随着表大小的增加而恶化

正如Tim在回答中提到的，解决方案是在四列上创建一个索引，以便db能够快速确定是否存在匹配

相关问题更多 >

编程相关推荐

热门问题

热门文章