通过Panflute的Pandoc过滤器未按预期工作

2024-06-29 00:21:49 发布

您现在位置:Python中文网/ 问答频道 /正文

问题

对于降价文档,我想过滤掉列表中标题为而不是的所有部分to_keep。节由标题和正文组成,直到下一节或文档结尾。为了简单起见,让我们假设文档只有级别1的标题

当我对当前元素是否在to_keep中的头前面进行简单的大小写区分并执行return Nonereturn []时,我会得到一个错误。也就是说,对于pandoc --filter filter.py -o output.pdf input.md,我得到了TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"(最后是代码、示例文件和完整的错误消息)

我使用Python3.7.4、Panfleet 1.12.5和pandoc 2.2.3.2

问题:

如果对何时执行return []进行更细粒度的区分,则它会起作用(函数action_working我的问题是,为什么需要更细粒度的区分?我的解决方案似乎有效,但很可能是偶然的。。。我怎样才能让它正常工作

档案

错误

Traceback (most recent call last):
  File "filter.py", line 42, in <module>
    main()
  File "filter.py", line 39, in main
    return run_filter(action_not_working, doc=doc)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 266, in run_filter
    return run_filters([action], *args, **kwargs)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 253, in run_filters
    dump(doc, output_stream=output_stream)
  File "C:\Users\ody_he\AppData\Local\Continuum\anaconda3\lib\site-packages\panflute\io.py", line 132, in dump
    raise TypeError(msg)
TypeError: panflute.dump needs input of type "panflute.Doc" but received one of type "list"
Error running filter filter.py:
Filter returned error status 1

input.md

# English 
Some cool english text this is!

# Deutsch 
Dies ist die deutsche Übersetzung!

# Sources
Some source.

# Priority
**Medium** *[Low | Medium | High]*

# Status
**Open for Discussion** *\[Draft | Open for Discussion | Final\]*

# Interested Persons (mailing list)
- Franz, Heinz, Karl

fiter.py

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

def action_not_working(elem, doc):
    '''For every element we check if it occurs in a section we wish to keep. 
    If it is, we keep it and return None (indicating to keep the element unchanged).
    Otherwise we remove the element (return []).'''
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        return []

def action_working(elem, doc):
    global to_keep, keep_current
    update_keep(elem)
    if keep_current:
        return None
    else:
        if isinstance(elem, Header):
            return []
        elif isinstance(elem, Para):
            return []
        elif isinstance(elem, BulletList):
            return []

def update_keep(elem):
    '''if the element is a header we update to_keep.'''
    global to_keep, keep_current
    if isinstance(elem, Header):
        # Keep if the title of a section is in too keep
        keep_current = stringify(elem) in to_keep


def main(doc=None):
    return run_filter(action_not_working, doc=doc) 

if __name__ == '__main__':
    main()

Tags: oftoinpynonedocreturnif
1条回答
网友
1楼 · 发布于 2024-06-29 00:21:49

我认为发生的是Panflut调用所有元素上的操作,包括Doc根元素。在遍历Doc元素时,如果keep_currentFalse,则它将被列表替换。这将导致您看到的错误消息,因为panflute期望根节点始终存在

更新的筛选器仅作用于HeaderParaBulletList元素,因此Doc根节点将保持不变。您可能希望使用更通用的方法,例如isinstance(elem, Block)


另一种方法是直接使用Panfleet的loaddump元素:将文档加载到Doc元素中,手动迭代args中的所有块并删除所有不需要的块,然后将生成的文档转储回输出流

from panflute import *

to_keep = ['Deutsch', 'Status']
keep_current = False

doc = load()
for top_level_block in doc.args:
    # do things, remove unwanted blocks

dump(doc)

相关问题 更多 >