tidylib会损坏我的HTML文件吗？

2024-05-10 09:53:00 发布

您现在位置：Python中文网/ 问答频道 /正文

5298

网友

男 | 程序猿一只，喜欢编程写python代码。

我使用的是python3.5，在某些情况下，当我调用tidylib.tidy\u文档在HTML文件中，“/”字符位于正在删除标头。当它将删除此字符。你知道吗

我使用的HTML文件是用writer2epub生成的Epub的一部分。这个此Epub中几乎所有文件都出错。唯一的例外是简短的（如文件的标题页）。所有的错误都是一样的受影响的文件。你知道吗

我怀疑使用回车（0x0d）代替换行符（0x0a），但更改换行符并没有什么区别。我还看到该文件包含各种其他非ASCII字符，所以可能是它们造成的。用tidylib搜索unicode问题时，没有发现任何与此问题相关的问题。你知道吗

我上传了一个test file，它用以下代码再现了问题：

import re
from tidylib import tidy_document



def printLink(html):
    """ Print the <link> tag from the HTML header """
    for line in html.split('\n'):
        match = re.search('<link[^>]+>', line)
        if match is not None:
            print(match.group(0))



if __name__ == '__main__':
    fname = 'test04.xhtml'
    print(fname)
    with open(fname, 'r') as fh:
        html = fh.read()

    print('checkpoint 01')
    printLink(html)
    newHtml, errors = tidy_document(html)
    print('checkpoint 02')
    printLink(newHtml)

如果问题重现，输出将为：

<link rel="stylesheet" href="../styles/style001.css" type="text/css" />

在01号检查站和

<link rel="stylesheet" href="../styles/style001.css" type="text/css">

在02检查站。你知道吗

是什么导致tidylib删除这个“/”字符？你知道吗

Tags：文件 import re html match link epub 字符

0条回答

目前没有回答

tidylib会损坏我的HTML文件吗？

相关问题更多 >

编程相关推荐

热门问题

热门文章

tidylib会损坏我的HTML文件吗？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >