如何将.xml元素属性导出到另一个现有的.xml?

2024-06-02 08:20:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我在每个电影目录中有2个xml文件,一个名为mymovies.xml,另一个名为moviename.nfo(两个xml文件)

我试图做的是“提取”子属性:语言、类型、通道:

<AudioTracks>
    <AudioTrack Language="German" Type="DTS-HD Master" Channels="7.1" />
    <AudioTrack Language="German" Type="Dolby Digital" Channels="2.0" />
    <AudioTrack Language="English" Type="DTS-HD Master" Channels="7.1" />
</AudioTracks>

并以以下格式将其“导入”到moviename.info中:

<fileinfo>
    <streamdetails>
        <audio>
            <codec>dtshdmaster</codec>
            <language>ger</language>
            <channels>8</channels>
        </audio>
         <audio>
            <codec>dolbydigital</codec>
            <language>ger</language>
            <channels>2</channels>
        </audio>
        <audio>
            <codec>dtshdmaster</codec>
            <language>eng</language>
            <channels>8</channels>
        </audio>
    </streamdetails>
</fileinfo>

moviename.info示例:

   <?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
  <title>Barry Lyndon</title>
  <originaltitle>Barry Lyndon</originaltitle>
  <sorttitle>Barry Lyndon</sorttitle>
  <set>
  </set>
  <rating>8</rating>
  <year>1975</year>
  <top250>
  </top250>
  <votes>
  </votes> 
  <tagline>
  </tagline>
  <runtime>185</runtime>
  <thumb>
  </thumb>
  <mpaa>Rated PG-13</mpaa>
  <playcount>0</playcount>
  <watched>false</watched>
  <id>tt0072684</id>
  <filenameandpath>
  </filenameandpath>
  <country>Germany</country>
  <trailer>
  </trailer>
  <certification>Germany:FSK ab 12 freigegeben</certification>
  <genre>War</genre>
  <genre>Drama</genre>
  <genre>Romance</genre>
  <studio>Peregrine</studio>
  <credits>Stanley Kubrick, William Makepeace Thackeray</credits>
  <director>Stanley Kubrick</director>
  <createdby>My Movies</createdby>
</movie>

预期产出:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
  <title>Barry Lyndon</title>
  <originaltitle>Barry Lyndon</originaltitle>
  <sorttitle>Barry Lyndon</sorttitle>
  <set>
  </set>
  <rating>8</rating>
  <year>1975</year>
  <top250>
  </top250>
  <votes>
  </votes>
  <tagline>
  </tagline>
  <runtime>185</runtime>
  <thumb>
  </thumb>
  <mpaa>Rated PG-13</mpaa>
  <playcount>0</playcount>
  <watched>false</watched>
  <id>tt0072684</id>
  <filenameandpath>
  </filenameandpath>
  <country>Germany</country>
  <trailer>
  </trailer>
  <fileinfo>
    <streamdetails>
      <audio>
          <codec>dtshdmaster</codec>
          <language>ger</language>
          <channels>8</channels>
       </audio>
       <audio>
          <codec>dolbydigital</codec>
          <language>ger</language>
          <channels>2</channels>
      </audio>
      <audio>
          <codec>dtshdmaster</codec>
          <language>eng</language>
          <channels>8</channels>
      </audio>
    </streamdetails>
  </fileinfo>
  <certification>Germany:FSK ab 12 freigegeben</certification>
  <genre>War</genre>
  <genre>Drama</genre>
  <genre>Romance</genre>
  <studio>Peregrine</studio>
  <credits>Stanley Kubrick, William Makepeace Thackeray</credits>
  <director>Stanley Kubrick</director>
  <createdby>My Movies</createdby>
</movie>

到目前为止,我已经:

import xml.etree.ElementTree as ET

root_node = ET.parse('mymovies.xml').getroot()

for tag in root_node.findall('AudioTracks/AudioTrack'):

value = tag.attrib['Language']
print(value)

value = tag.attrib['Type']
print (value)

value = tag.attrib ['Channels']

print (value)

输出为:

English
DTS-HD Master
5.1
English
Dolby Digital
2.0
French
Dolby Digital
5.1
Spanish
Dolby Digital
5.1
Portuguese
Dolby Digital
5.1

我现在想知道的是:

  • 如何导入2个元素树
  • 如何将特定的解析信息写入另一个文件
  • 如何使属性精确到我需要的级别和形式

Tags: valuetypexmllanguageaudiocodecchannelsdigital
1条回答
网友
1楼 · 发布于 2024-06-02 08:20:52

看看你能不能用这个。我做了一些假设来重新翻译这些值(语言、编解码器、频道)

import xml.etree.ElementTree as ET
import re


# https://web.archive.org/web/20120301034645/http://effbot.org/zone/element-lib.htm#prettyprint
# in-place prettyprint formatter
def indent(elem, level=0):
    i = "\n" + level * "  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for elem in elem:
            indent(elem, level + 1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i


pattern = re.compile(r'[^A-Za-z]')

source = '''\
<AudioTracks>
    <AudioTrack Language="German" Type="DTS-HD Master" Channels="7.1" />
    <AudioTrack Language="German" Type="Dolby Digital" Channels="2.0" />
    <AudioTrack Language="English" Type="DTS-HD Master" Channels="7.1" />
</AudioTracks>
'''

template = '''\
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<movie>
  <title>Barry Lyndon</title>
  <originaltitle>Barry Lyndon</originaltitle>
  <sorttitle>Barry Lyndon</sorttitle>
  <set>
  </set>
  <rating>8</rating>
  <year>1975</year>
  <top250>
  </top250>
  <votes>
  </votes> 
  <tagline>
  </tagline>
  <runtime>185</runtime>
  <thumb>
  </thumb>
  <mpaa>Rated PG-13</mpaa>
  <playcount>0</playcount>
  <watched>false</watched>
  <id>tt0072684</id>
  <filenameandpath>
  </filenameandpath>
  <country>Germany</country>
  <trailer>
  </trailer>
  <certification>Germany:FSK ab 12 freigegeben</certification>
  <genre>War</genre>
  <genre>Drama</genre>
  <genre>Romance</genre>
  <studio>Peregrine</studio>
  <credits>Stanley Kubrick, William Makepeace Thackeray</credits>
  <director>Stanley Kubrick</director>
  <createdby>My Movies</createdby>
</movie>
'''

fileinfo = ET.Element('fileinfo')
e = ET.Element('streamdetails')
fileinfo.append(e)  # wrap in fileinfo

st = ET.fromstring(source)
for at in st.findall('./AudioTrack'):
    codec = pattern.sub('', at.attrib['Type']).lower()
    channels = str(sum(map(int, at.attrib['Channels'].split('.'))))
    language = at.attrib['Language'][:3].lower()
    ET.SubElement(e,
                  'audio',
                  codec=codec,
                  language=language,
                  channels=channels)

out = ET.fromstring(template)
for i, c in enumerate(out):
    if c.tag == 'certification':
        out.insert(i, fileinfo)
        break

indent(out)
print(ET.tostring(out).decode('utf8'))

相关问题 更多 >