lxml:向输入fi添加名称空间

2024-04-27 15:49:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在分析由外部program生成的xml文件。然后,我想使用自己的名称空间向该文件添加自定义注释。我的输入如下:

<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
  <model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
</sbml>

问题是,lxml只在使用名称空间时声明它们,这意味着声明会重复多次,就像so(simplified)一样:

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4">
  <listOfSpecies>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>
    </species>
    ....
  </listOfSpecies>
</sbml>

是否可以强制lxml在父元素(如sbmllistOfSpecies)中只编写一次此声明?还是有充分的理由不这样做?我想要的结果是:

<sbml xmlns="namespace" xmlns:celldesigner="morenamespace" level="2" version="4"  xmlns:kjw="http://this.is.some/custom_namespace">
  <listOfSpecies>
    <species>
      <kjw:test/>
      <celldesigner:data>Some important data which must be kept</celldesigner:data>
    </species>
    <species>
      <kjw:test/>
    </species>
    ....
  </listOfSpecies>
</sbml>

重要的问题是必须保留从文件中读取的现有数据,因此我不能仅仅创建一个新的根元素(我想?)。

编辑:代码附在下面。

def annotateSbml(sbml_input):
  from lxml import etree

  checkSbml(sbml_input) # Makes sure the input is valid sbml/xml.

  ns = "http://this.is.some/custom_namespace"
  etree.register_namespace('kjw', ns)

  sbml_doc = etree.ElementTree()
  root = sbml_doc.parse(sbml_input, etree.XMLParser(remove_blank_text=True))
  nsmap = root.nsmap
  nsmap['sbml'] = nsmap[None] # Makes code more readable, but seems ugly. Any alternatives to this?
  nsmap['kjw'] = ns
  ns = '{' + ns + '}'
  sbmlns = '{' + nsmap['sbml'] + '}'

  for species in root.findall('sbml:model/sbml:listOfSpecies/sbml:species', nsmap):
    species.append(etree.Element(ns + 'test'))

  sbml_doc.write("test.sbml.xml", pretty_print=True, xml_declaration=True)

  return

Tags: testhttpdataisannotationthisnamespacesbml
3条回答

我知道这是个老问题,但它仍然有效,从lxml 3.5.0开始,可能有更好的解决方案:

cleanup_namespaces() accepts a new argument top_nsmap that moves definitions of the provided prefix-namespace mapping to the top of the tree.

因此,现在只需调用以下命令即可向上移动命名空间映射:

nsmap = {'kjw': 'http://this.is.some/custom_namespace'}
etree.cleanup_namespaces(root, top_nsmap=nsmap)

除了直接处理原始XML之外,您还可以查看LibSBML,这是一个用于处理SBML文档的库,其中包括python的语言绑定。你可以这样使用它:

>>> from libsbml import *
>>> doc = readSBML('Dropbox/SBML Models/BorisEJB.xml')
>>> species = doc.getModel().getSpecies('MAPK')
>>> species.appendAnnotation('<kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>')
0
>>> species.toSBML()
'<species id="MAPK" compartment="compartment" initialConcentration="280" boundaryCondition="false">\n  <annotation>\n
 <kjw:test xmlns:kjw="http://this.is.some/custom_namespace"/>\n  </annotation>\n</species>'
>>>

在lxml中无法修改节点的命名空间映射。请参阅将此功能作为愿望列表项的this open ticket

它起源于lxml邮件列表中的this thread,其中workaround replacing the root node是一个替代项。不过,替换根节点有一些问题:请参阅上面的票证。

为了完整起见,我将在此处放置建议的根替换解决方案代码:

>>> DOC = """<sbml xmlns="http://www.sbml.org/sbml/level2/version4" xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" level="2" version="4">
...   <model metaid="untitled" id="untitled">
...     <annotation>...</annotation>
...     <listOfUnitDefinitions>...</listOfUnitDefinitions>
...     <listOfCompartments>...</listOfCompartments>
...     <listOfSpecies>
...       <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
...         <annotation>
...           <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...       <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
...         <annotation>
...            <celldesigner:extension>...</celldesigner:extension>
...         </annotation>
...       </species>
...     </listOfSpecies>
...     <listOfReactions>...</listOfReactions>
...   </model>
... </sbml>"""
>>> 
>>> from lxml import etree
>>> from StringIO import StringIO
>>> NS = "http://this.is.some/custom_namespace"
>>> tree = etree.ElementTree(element=None, file=StringIO(DOC))
>>> root = tree.getroot()
>>> nsmap = root.nsmap
>>> nsmap['kjw'] = NS
>>> new_root = etree.Element(root.tag, nsmap=nsmap)
>>> new_root[:] = root[:]
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))
>>> new_root.append(etree.Element('{%s}%s' % (NS, 'test')))

>>> print etree.tostring(new_root, pretty_print=True)
<sbml xmlns:celldesigner="http://www.sbml.org/2001/ns/celldesigner" xmlns:kjw="http://this.is.some/custom_namespace" xmlns="http://www.sbml.org/sbml/level2/version4"><model metaid="untitled" id="untitled">
    <annotation>...</annotation>
    <listOfUnitDefinitions>...</listOfUnitDefinitions>
    <listOfCompartments>...</listOfCompartments>
    <listOfSpecies>
      <species metaid="s1" id="s1" name="GenA" compartment="default" initialAmount="0">
        <annotation>
          <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
      <species metaid="s2" id="s2" name="s2" compartment="default" initialAmount="0">
        <annotation>
           <celldesigner:extension>...</celldesigner:extension>
        </annotation>
      </species>
    </listOfSpecies>
    <listOfReactions>...</listOfReactions>
  </model>
<kjw:test/><kjw:test/></sbml>

相关问题 更多 >