lxml在写入文件时验证xml,并进行回读,但不验证从中生成文件的DOM树

2024-09-29 04:24:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用lxml生成xml并对其进行验证。我发现,当我尝试使用XSD验证器验证DOM树时,验证器会抱怨它无法识别根元素;相比之下,当该树转换为文本、写入文件、读回并解析到DOM树时,新DOM树的根元素被识别。希望下面能说明我的意思。在

我的问题是:为什么会这样?除了呈现为字符串并重新分析之外,有没有一种方法可以让原始DOM树进行验证?在

>>> from lxml import etree
>>> import io
>>> validator = etree.XMLSchema(etree.parse(io.open('mainapp/xsd/forms/CompanyIncorporation-v2-6.xsd')))
>>> c = CompanyUK.objects.all()[5]
>>> cotree = xml.Company.generate_company_incorporation_tree(c)
>>> cotree
<Element CompanyIncorporation at 0x38f2750>
>>> validator.assertValid(cotree)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element 'CompanyIncorporation': No matching global declaration available for the validation root.
>>> cott = etree.ElementTree(cotree)
>>> validator.assertValid(cott)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element 'CompanyIncorporation': No matching global declaration available for the validation root.
>>> xmlfile = open("xmlfile", "w")
>>> xmlfile.write(etree.tostring(cott))
>>> xmlfile.flush()
>>> xmlfile.close()
>>> xmlfile = open("xmlfile", "r")
>>> ptree = etree.parse(xmlfile)
>>> validator.assertValid(ptree)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element '{http://xmlgw.companieshouse.gov.uk}Country': [facet 'enumeration'] The value 'EW' is not an element of the set {'GB-ENG', 'GB-WLS', 'GB-SCT', 'GB-NIR', 'GBR', 'UNDEF'}., line 1
>>> etree.tostring(ptree) == etree.tostring(cott)
True
>>> validator.assertValid(etree.parse(io.StringIO(etree.tounicode(ptree))))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element '{http://xmlgw.companieshouse.gov.uk}Country': [facet 'enumeration'] The value 'EW' is not an element of the set {'GB-ENG', 'GB-WLS', '
GB-SCT', 'GB-NIR', 'GBR', 'UNDEF'}., line 1
>>> validator.assertValid(etree.parse(io.StringIO(etree.tounicode(cot))))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
NameError: name 'cot' is not defined
>>> validator.assertValid(etree.parse(io.StringIO(etree.tounicode(cott))))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element '{http://xmlgw.companieshouse.gov.uk}Country': [facet 'enumeration'] The value 'EW' is not an element of the set {'GB-ENG', 'GB-WLS', '
GB-SCT', 'GB-NIR', 'GBR', 'UNDEF'}., line 1
>>> validator.assertValid(etree.parse(io.BytesIO(etree.tostring(cott))))
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "lxml.etree.pyx", line 3006, in lxml.etree._Validator.assertValid (src/lxml/lxml.etree.c:125415)
DocumentInvalid: Element '{http://xmlgw.companieshouse.gov.uk}Country': [facet 'enumeration'] The value 'EW' is not an element of the set {'GB-ENG', 'GB-WLS', '
GB-SCT', 'GB-NIR', 'GBR', 'UNDEF'}., line 1
>>>

正如您在上面看到的,从磁盘解析的版本,验证得到了进一步的发展,即使它们呈现为相同的字符串(同样,验证在往返字符串后超出了根元素,并重新解析为DOM树)。那么,是什么原因造成的呢?在


Tags: inmostlineelementcalllxmlvalidatorfile