对于unicode代码点,python xml.etree.ElementTree tostring()fromstring()往返失败

2024-09-27 20:19:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我在Python2.7中使用xml.etree.ElementTree,往返字符串时遇到问题。如果树中存在非ascii Unicode字符,则对ET.tostring()调用ET.fromstring()失败。

为什么这不管用?既然ElementTree想要bytestreams并进行自己的解码,那么为什么它默认为ASCII解析器?这是由我忽略的东西决定的吗,比如python文件的编码或语言环境?

  1. 仅限ASCII字符:

    import xml.etree.ElementTree as ET
    
    t1 = ET.Element('test')
    t1.text = u'hello world'
    t1_roundtrip = ET.fromstring(ET.tostring(t1, encoding='utf8', method='xml'))
    # ET.dump(t1) == ET.dump(t1_roundtrip)
    
  2. Unicode代码点失败:

    import xml.etree.ElementTree as ET
    
    t2 = ET.Element('test')
    t2.text = u'\u2603'
    t2_roundtrip = ET.fromstring(ET.tostring(t2, encoding='utf8', method='xml'))
    
    >>> t2_roundtrip = ET.fromstring(ET.tostring(t2, encoding='utf8', method='xml'))
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/opt/rh/python27/root/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1300, in XML
        parser.feed(text)
      File "/opt/rh/python27/root/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1642, in feed
        self._raiseerror(v)
      File "/opt/rh/python27/root/usr/lib64/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
        raise err
    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 2, column 6
    

Tags: textinlinexmlutf8encodingetfile

热门问题