将HTML转换为等效的降价结构化文本。
html2text的Python项目详细描述
html2文本
html2text是一个python脚本,它将一页html转换成干净、易于阅读的纯ascii文本。更好的是,ascii也是有效的标记(文本到html格式)。
用法:html2text [filename [encoding]]
Option | Description |
---|---|
^{ | Show program's version number and exit |
^{ | Show this help message and exit |
^{ | Don't include any formatting for links |
^{ | Escape all special characters. Output is less readable, but avoids corner case formatting issues. |
^{ | Use reference links instead of links to create markdown |
^{ | Mark preformatted and code blocks with [code]...[/code] |
有关选项的完整列表,请参见docs
或者您可以在Python
:
>>> import html2text
>>>
>>> print(html2text.html2text("<p><strong>Zed's</strong> dead baby, <em>Zed's</em> dead.</p>"))
**Zed's** dead baby, _Zed's_ dead.
或者使用一些配置选项:
>>> import html2text
>>>
>>> h = html2text.HTML2Text()
>>> # Ignore converting links from HTML
>>> h.ignore_links = True
>>> print h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!")
Hello, world!
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, world!
>>> # Don't Ignore links anymore, I like links
>>> h.ignore_links = False
>>> print(h.handle("<p>Hello, <a href='https://www.google.com/earth/'>world</a>!"))
Hello, [world](https://www.google.com/earth/)!
最初由亚伦·斯沃茨撰写。此代码在gplv3下分发。
如何安装
html2text
在pypi上可用
https://pypi.org/project/html2text/
$ pip install html2text
如何运行单元测试
tox
查看覆盖率结果:
coverage html
然后在浏览器中打开./htmlcov/index.html
文件。
文档
文档寿命here