这是我的xml
文件:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE papers>
<papers>
<paper>
<title>Title containing & and more</title>
</paper>
</papers>
我如何使用lxml
的etree
来读它?我试过了
但这会导致以下回溯:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
File "parser.pxi", line 1769, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102257)
File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument (src/lxml/lxml.etree.c:102516)
File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike (src/lxml/lxml.etree.c:101442)
File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike (src/lxml/lxml.etree.c:97069)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 622, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91757)
lxml.etree.XMLSyntaxError: xmlParseEntityRef: no name, line 5, column 30
由于xml文件格式不正确,因此由于ampersand(预定义的xml实体)可以使用BeautifulSoup。它是一个更容错的解析器。在
输出
^{pr2}$如果需要保留
&
字符,可以将文件解析为HTML。在如果不需要
^{pr2}$&
字符,可以创建一个新的XML解析器并传递recover=True
选项。在相关问题 更多 >
编程相关推荐