将大型xml文件拆分为n组

import xml.etree.ElementTree as ET context = ET.iterparse('file.xml', events=('end', )) index = 0 for event, elem in context: if elem.tag == 'row': index += 1 filename = format(str(index) + ".xml") with open(filename, 'wb') as f: f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n") f.write(ET.tostring(elem))

from itertools import zip_longest def grouper(iterable, n, fillvalue=None): "Collect data into fixed-length chunks or blocks" # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue)

1条回答

网友

1楼 · 发布于 2024-09-17 02:03:43

您有一个iterable of（event，elements）对：

context = ET.iterparse('file.xml', events=('end', ))

现在，您希望将其筛选为row元素：

^{pr2}$

现在你要把它们分组。使用the ^{} recipe from the ^{} docs：

groups = grouper(rows, 2)

很明显，一旦你开始工作并想真正运行它，你就可以把2改成{}或者其他任何东西。在

现在，您可以迭代组。在我们讨论的时候，让我们使用^{}，这样你就不需要那些手册index += 1的东西了。另外，我们不用手动构建一个字符串，然后毫无意义地调用^{}，而是使用an f-string。在

for index, group in enumerate(groups):
    # If you need to run on 3.5 or 2.7, use "{}.xml".format(index)
    filename = f"{index}.xml"
    with open(filename, 'wb') as f:
        f.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n")

…然后迭代组中的元素，但要小心；如果元素的数目是奇数，grouper将用None值填充不完整的最后一个组。¹

        for elem in group:
            if elem:
                f.write(ET.tostring(elem))

_{1。这并不是很难改变，但我是直接从文档中使用配方，所以我不必解释如何更改它。}

相关问题更多 >

编程相关推荐

热门问题

热门文章