我必须用这种结构清理许多XML文件:
<Corps_TTL>
<Id_TTL>60006</Id_TTL>
<Donnees_Releve> <!-- could be many -->
<Donnees_Par_Type_Mesure> <!-- could be many -->
<Conso_Par_Classe_Temporelle>
<Classe_Temporelle>HPH</Classe_Temporelle>
<Quantite_Mesure>0</Quantite_Mesure>
</Conso_Par_Classe_Temporelle>
<Conso_Par_Classe_Temporelle>
<Classe_Temporelle>HPH</Classe_Temporelle>
<Quantite_Mesure>0</Quantite_Mesure>
</Conso_Par_Classe_Temporelle>
<Conso_Par_Classe_Temporelle>
<Classe_Temporelle>HPE</Classe_Temporelle>
<Quantite_Mesure>1072</Quantite_Mesure>
</Conso_Par_Classe_Temporelle>
<Conso_Par_Classe_Temporelle> <!-- could have many or none duplicate node for this Classe_Temporelle value depending on the file-->
<Classe_Temporelle>HPE</Classe_Temporelle>
<Quantite_Mesure>1072</Quantite_Mesure>
</Conso_Par_Classe_Temporelle>
</Donnees_Releve>
</Corps_TTL>
我编写了一个Python脚本,用相同的Classe_Temporelle值对Conso_Par_Classe_Temporelle进行重复数据消除。目标是获得如下所示的输出文件:
<Corps_TTL>
<Id_TTL>60006</Id_TTL>
<Donnees_Releve> <!-- could be many -->
<Donnees_Par_Type_Mesure> <!-- could be many -->
<Conso_Par_Classe_Temporelle>
<Classe_Temporelle>HPH</Classe_Temporelle>
<Quantite_Mesure>0</Quantite_Mesure>
</Conso_Par_Classe_Temporelle>
<Conso_Par_Classe_Temporelle> <!-- only one node for this Classe_Temporelle value -->
<Classe_Temporelle>HPE</Classe_Temporelle>
<Quantite_Mesure>1072</Quantite_Mesure>
</Conso_Par_Classe_Temporelle>
</Donnees_Releve>
</Corps_TTL>
请在下面找到我写的代码,我不明白为什么它不起作用,可能是因为我寻找一个属性并记录了一个元素值。问题是我不知道如何解决它
import pprint
import os
import copy
import xml.etree.ElementTree as ET
folder_path = "/files/IN/"
out_folder_path = "/files/OUT/"
for path, dirs, files in os.walk(folder_path):
for filename in files:
if filename.endswith(".xml"):
print("parsing : "+filename)
tree = ET.parse(folder_path + filename)
root = tree.getroot()
#on boucle sur chaque PRM
for dr in root.iter('Donnees_Releve'):
print("-----------------------")
for dtm in dr.iter('Donnees_Par_Type_Mesure'):
#print type of node
print('node # : ' + dtm.find('Type_Mesure').text)
#loop on Donnees_Releve
# Use a `set` to keep track of "visited" elements with good lookup time.
visited = set()
# The iter method does a recursive traversal
for el in dtm.iter('Conso_Par_Classe_Temporelle'):
# Since the id is what defines a duplicate for you
if 'Classe_Temporelle' in el.attr:
current = el.find('Classe_Temporelle').text
# In visited already means it's a duplicate, remove it
if current in visited:
el.getparent().remove(el)
# Otherwise mark this ID as "visited"
else:
visited.add(current)
tree.write(out_folder_path+filename)
你能帮我完成剧本吗
问候,
下面
输出
相关问题 更多 >
编程相关推荐