我需要分割一个巨大的xml文件 然后插入到数据库中,这是做我正在做的事情最有效的方法吗
这是我的密码
import xml.etree.cElementTree as etree
filename = r'D:\test\Books.xml'
context = iter(etree.iterparse(filename, events=('start', 'end')))
_, root = next(context)
books = []
for event, elem in context:
if event == 'start' and elem.tag == '{http://www.book.org/Book-19200/biblography}Book':
etree.register_namespace("", "http://www.book.org/Book-19200/biblography")
xml = etree.tostring(elem)
xmls.append(xml)
if len(xmls) == 100:
populate_db(books)
books = []
root.clear()
def populate_db(books):
c.executemany('INSERT INTO Books VALUES (?)', books)
我的样品书本.xml看起来像这样
<Books>
<Book xmlns="http://www.book.org/Book-19200/biblography"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
ISBN="519292296"
xsi:schemaLocation="http://www.book.org/Book-19200/biblography ../../book.xsd
http://www.w3.org/2000/12/xmldsig# ../../xmldsig-core-schema.xsd">
<Detail ID="67">
<BookName>Code Complete 2</BookName>
<Author>Steve McConnell</Author>
<Pages>960</Pages>
<ISBN>0735619670</ISBN>
<BookName>Application Architecture Guide 2</BookName>
<Author>Microsoft Team</Author>
<Pages>496</Pages>
<ISBN>073562710X</ISBN>
</Detail>
</Book>
<Book xmlns="http://www.book.org/Book-19200/biblography"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
ISBN="519292296"
xsi:schemaLocation="http://www.book.org/Book-19200/biblography ../../book.xsd
http://www.w3.org/2000/12/xmldsig# ../../xmldsig-core-schema.xsd">
<Detail ID="87">
<BookName>Rocking Python</BookName>
<Author>Guido Rossum</Author>
<Pages>960</Pages>
<ISBN>0735619690</ISBN>
<BookName>Python Rocks</BookName>
<Author>Microsoft Team</Author>
<Pages>496</Pages>
<ISBN>073562710X</ISBN>
</Detail>
</Book>
</Books>
这种方法附加到一个列表中,使用100个批并插入到数据库中是有效的还是我需要在这里考虑多线程
这取决于您的XML文件有多“庞大”。兆字节还是兆字节?你知道吗
很可能解析XML文件要比插入DB花费更多的时间,所以不要过度设计它。只要试一下你的代码。你知道吗
记住:过早的优化是万恶之源。
相关问题 更多 >
编程相关推荐