使用pythonpptx时，从演示文稿文件中提取的文本顺序不正确

2024-09-26 17:43:17 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试使用以下代码从powerpoint文本框中提取文本：

from pptx import Presentation
from pptx.enum.shapes import MSO_SHAPE_TYPE

def iter_textable_shapes(shapes):
    for shape in shapes:
        if shape.has_text_frame:
            yield shape

def iter_textframed_shapes(shapes):
    """Generate shape objects in *shapes* that can contain text.

    Shape objects are generated in document order (z-order), bottom to top.
    """
    for shape in shapes:
        # ---recurse on group shapes---
        if shape.shape_type == MSO_SHAPE_TYPE.GROUP:
            group_shape = shape
            for shape in iter_textable_shapes(group_shape.shapes):
                yield shape
            continue

        # ---otherwise, treat shape as a "leaf" shape---
        if shape.has_text_frame:
            yield shape

prs = Presentation(path_to_my_prs)
 
for slide in prs.slides:
    textable_shapes = list(iter_textframed_shapes(slide.shapes))
    ordered_textable_shapes = sorted(
        textable_shapes, key=lambda shape: (shape.top, shape.left)
    )

    for shape in ordered_textable_shapes:
        print(shape.text)

但有时在PPT结尾的文本框首先被提取，有时在中间提取，等等。如何修复代码以获得正确顺序的文本（从左到右、从上到下）

Tags：代码 text in from 文本 for if group

0条回答

目前没有回答

使用pythonpptx时，从演示文稿文件中提取的文本顺序不正确

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用pythonpptx时，从演示文稿文件中提取的文本顺序不正确

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >