我发布的脚本会很长,所以我请求您耐心等待。我相信对于那些熟悉XML结构复杂性的人来说,这是很容易解决的。我有一个高度分层的XML文件。我很乐意在此基础上制作sqlite数据库。到目前为止,我有两个文件从具有这种结构的XML文件中提取同级信息(注意,工作频率从0到4或5不等;节点也可以是空的):
<program>
<id>28798031</id>
<programID>12345</programID>
<orchestra>New York Philarmonic</orchestra>
<season>1842-43</season>
<concertInfo>
<eventType>Subscription Season</eventType>
<Location>Manhattan, NY</Location>
<Venue>Apollo Rooms</Venue>
<Date>1842-12-07T05:00:00Z</Date>
<Time>8:00PM</Time>
</concertInfo>
<worksInfo>
<work ID="52446*">
<composerName>Beethoven, Ludwig van</composerName>
<workTitle>SYMPHONY NO. 5 IN C MINOR, OP.67</workTitle>
<conductorName>Hill, Ureli Corelli</conductorName>
</work>
<work ID="8834*4">
<composerName>Weber, Carl Maria Von</composerName>
<workTitle>OBERON</workTitle>
<movement>"Ozean, du Ungeheuer" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II</movement>
<conductorName>Timm, Henry C.</conductorName>
<soloists>
<soloist>
<soloistName>Otto, Antoinette</soloistName>
<soloistInstrument>Soprano</soloistInstrument>
<soloistRoles>S</soloistRoles>
</soloist>
</soloists>
</work>
</worksInfo>
</program>
我有两个脚本从同级节点(相同的层次结构级别)提取数据。在
首先
^{pr2}$秒
import sqlite3
import xml.etree.cElementTree
def insert_work(db_conn, workID, composerName, workTitle, movement, conductorName):
curs = db_conn.cursor()
curs.execute("insert into work values (?,?,?,?,?)", (workID, composerName, workTitle, movement, conductorName))
db_conn.commit()
def work_data_from_element(element):
workID = element.get("ID")
if workID != None:
workID = workID
else:
workID = ''
composerName = element.find("composerName")
if composerName != None:
composerName = composerName.text
else:
composerName = ''
workTitle = element.find("workTitle")
if workTitle != None:
workTitle = workTitle.text
else:
workTitle = ''
movement = element.find("movement")
if movement != None:
movement = movement.text
else:
movement = ''
conductorName = element.find("conductorName")
if conductorName != None:
conductorName = conductorName.text
else:
conductorName = ''
return workID, composerName, workTitle, movement, conductorName
## add the main loop to get all the work information from the XML file
if __name__ == "__main__":
conn = sqlite3.connect("work.sqlite3")
programs = xml.etree.cElementTree.parse("complete.xml")
work = programs.findall("program/worksInfo/work")
for index, element in enumerate(work):
workID, composerName, workTitle, movement, conductorName = work_data_from_element(element)
insert_work(conn, workID, composerName, workTitle, movement, conductorName)
我的问题是——如何把这两者联系起来,这样我就可以知道programID、season、workID和conductorName等等,所有这些都在一起?谢谢您!!在
无论何时处理需要为二维格式(如数据库表)展平的嵌套分层XML文件,请考虑使用XSLT解决方案。作为信息,XSLT是一种专门用于转换XML文件的编程语言,与大多数通用语言一样,Python可以处理XSLT1.0脚本,特别是使用lxml模块。在
因此,对于program和workSQLite表,可以考虑运行以下XSLT脚本来转换源XML文件,然后使用XPath(XSLT的同级)进行解析,以便在一个Python脚本中导入数据库。顺便说一句,XSLT是一个格式良好的XML文件,因此可以像任何其他XML一样从文件或字符串加载。XSLT的优点还在于,可以对可能存在/可能不存在的元素进行硬编码,从而为缺少的节点返回空文本。在
结构上的一些变化:
<concerntInfo>
的所有子级(可能用作单独的表?)在<worksInfo>
,因为它反映在work表中<id>
添加到work解析中以用作表中的外键编程XSLT(另存为.xsl文件,在Python中调用)
工作XSLT(另存为.xsl,在Python中调用)
^{pr2}$Python脚本
相关问题 更多 >
编程相关推荐