我在Python2.7中使用xml.etree.ElementTree
,往返字符串时遇到问题。如果树中存在非ascii Unicode字符,则对ET.tostring()
调用ET.fromstring()
失败。
为什么这不管用?既然ElementTree
想要bytestreams并进行自己的解码,那么为什么它默认为ASCII解析器?这是由我忽略的东西决定的吗,比如python文件的编码或语言环境?
仅限ASCII字符:
import xml.etree.ElementTree as ET
t1 = ET.Element('test')
t1.text = u'hello world'
t1_roundtrip = ET.fromstring(ET.tostring(t1, encoding='utf8', method='xml'))
# ET.dump(t1) == ET.dump(t1_roundtrip)
Unicode代码点失败:
import xml.etree.ElementTree as ET
t2 = ET.Element('test')
t2.text = u'\u2603'
t2_roundtrip = ET.fromstring(ET.tostring(t2, encoding='utf8', method='xml'))
>>> t2_roundtrip = ET.fromstring(ET.tostring(t2, encoding='utf8', method='xml'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/rh/python27/root/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in XML
parser.feed(text)
File "/opt/rh/python27/root/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed
self._raiseerror(v)
File "/opt/rh/python27/root/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
raise err
xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 6
目前没有回答
相关问题 更多 >
编程相关推荐