在Python中解析docx文件

import os from docx import Document import re directory = input("Copy and paste the location of the files.\n").lower() for file in os.listdir(directory): document = Document(directory+file) head1s = [] for paragraph in document.paragraphs: heading = re.match(r'^[A-Z]+[.]\s', paragraph.text) for run in paragraph.runs: if run.bold: if heading: head1 = paragraph.text head1 = head1.split('.')[1] head1s.append(head1) print(head1s)

1条回答

网友

1楼 · 发布于 2024-10-01 13:29:17

所发生的是循环继续超过D.Fox，因此在这个新的循环中，即使没有匹配项，它仍打印head1的最后一个值，即D.Fox。在

我认为是for run in paragraph.runs:以某种方式运行了两次，也许还有第二次“运行”在那里，但看不见？在

也许在找到第一个匹配时添加一个中断足以防止第二次运行触发？在

for file in os.listdir(directory):

document = Document(directory+file)

head1s = []

for paragraph in document.paragraphs:

    heading = re.match(r'^[A-Z]+[.]\s', paragraph.text)

    for run in paragraph.runs:

        if run.bold:

            if heading:
                head1 = paragraph.text
                head1 = head1.split('.')[1]
                head1s.append(head1)
                # this break stops the run loop if a match was found.
                break

print(head1s)

相关问题更多 >

编程相关推荐

热门问题

热门文章