如何用pyPdf2获取PDF索引的数据帧

2024-09-29 06:24:05 发布

您现在位置：Python中文网/ 问答频道 /正文

4683

网友

男 | 程序猿一只，喜欢编程写python代码。

我想用pyPdf2获得PDF索引的dataflame

我在中找到了以下代码 Read all bookmarks from a PDF document and create a dictionary with PageNumber and Title of the bookmark

import PyPDF2

def show_tree(bookmark_list, indent=0):
    for item in bookmark_list:
        if isinstance(item, list):
            # recursive call with increased indentation
            show_tree(item, indent + 4)
        else:
            print(" " * indent + item.title)

reader = PyPDF2.PdfFileReader("[your filename]")
show_tree(reader.getOutlines())

我修改了这个函数如下

def show_bookmark(bookmark_list, indent=0):
    IndexDataFrame = pd.DataFrame(index=[], columns=['IndexLevel', 'Title'])
    
    for item in bookmark_list:
        if isinstance(item, list):
            # recursive call with increased indentation
            show_bookmark(item, indent + 1)
        else:
            record = pd.Series([indent, item.title], index=IndexDataFrame.columns)
            IndexDataFrame = IndexDataFrame.append(record, ignore_index=True)
            #print(indent, item.title)

    return IndexDataFrame

但是，IndexDataFrame不包括所有数据，只包括indexLeve为0的数据

我只想将前一个函数的数据打印为数据帧的一种类型

Tags： and 数据 tree index pdf title show with

0条回答

目前没有回答

如何用pyPdf2获取PDF索引的数据帧

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何用pyPdf2获取PDF索引的数据帧

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >