正在访问!条目定义

2024-09-28 05:17:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我想进入!XML文件中的条目。你知道吗

我使用HTMLParser进行预处理并获取信息,请参见 Accessing !ENTITY statement and reference

但由于我使用lxml解析器来处理文件,我更希望 用它来解决问题。你知道吗

GDML(XML)文件看起来像 你知道吗

<!DOCTYPE gdml [
    <!ENTITY materials SYSTEM "materialsOptical.xml"> 
    <!ENTITY solids_Mainz_v2 SYSTEM "solids_Mainz_v2.xml"> 
    <!ENTITY matrices_Mainz_v2 SYSTEM "matrices_Mainz_v2.xml">
]> 

<gdml xmlns:gdml="http://cern.ch/2001/Schemas/GDML"       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"    xsi:noNamespaceSchemaLocation="schema/gdml.xsd">


<define>
<constant name="PI" value="1.*pi"/>
&matrices_Mainz_v2;
</define>
&materials; 
&solids_Mainz_v2;

<structure>
.... continued...

到目前为止我的尝试看起来像

from lxml import etree
myparser = etree.XMLParser(resolve_entities=False)
tree = etree.parse(filename, parser=myparser)

print(tree.docinfo)
print(dir(tree.docinfo))
print(tree.docinfo.doctype)
print(dir(tree.docinfo.doctype))

哪些输出

<lxml.etree.DocInfo object at 0x7fd2f2db6f98>
['URL', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__',    '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__pyx_vtable__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'doctype', 'encoding', 'externalDTD', 'internalDTD', 'public_id', 'root_name', 'standalone', 'system_url', 'xml_version']
<!DOCTYPE gdml>
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

与…有更进一步的关系

for e in tree.docinfo.internalDTD.iterentities() :
       print(e.name)
       print(e.content)
       print(e.orig)
       print(dir(e))~

打印出来了!实体名称,但不知道如何获取它们的值。你知道吗


Tags: 文件treeformatreduceinitdirxmllxml

热门问题