在Python中提取段落文本

1. Executive Summary 1.1 Summary of Services Energy Savings (Carbon Emissions and Intensity Reduction) Upgrade Economy Cycle on Level 2,5,6,7 & 8, replace Chilled Water Valves on Level 6 & 8 and install lighting controls on L5 & 6.. 1.2 Summary of Broadspectrum Offer A summary of our Offer to deliver the Scope of Work as outlined in the tender documents is provided below. Please refer to the various terms and conditions of our Offer as detailed herein. Please also find the cost breakdown

1条回答

网友

1楼 · 发布于 2024-10-01 09:28:40

这是一个初步的解决方案（等待对我对你上面帖子的评论的答复）。这还不能解释排除Summary of Broadspectrum Offer部分之后的附加段落。如果需要的话，您很可能需要一个小的regex匹配来确定是否遇到了另一个带有1.3（等等）的头段，如果遇到了，就停止理解。如果需要，请告诉我。在

编辑：将print()从列表理解方法转换为标准for循环，以响应Anton vBR下面的注释。在

from docx import Document

document = Document("North Sydney TE SP30062590-1 HVAC - Project Offer -  Rev1.docx")

# Find the index of the `Summary of Broadspectrum Offer` syntax and store it
ind = [i for i, para in enumerate(document.paragraphs) if 'Summary of Broadspectrum Offer' in para.text]
# Print the text for any element with an index greater than the index found in the list comprehension above
if ind:
    for i, para in enumerate(document.paragraphs):
        if i > ind[0]:
             print(para.text)

[打印(段落文本)对于i，枚举中的段落(文件.段落)如果ind和i>；ind[0]]

^{pr2}$

另外，这里还有一篇文章可以帮助解决另一种方法，即使用段落元数据检测heading类型：Extracting headings' text from word doc

相关问题更多 >

编程相关推荐

热门问题

热门文章