回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我对python很陌生。我有一个非常大的xml文件,我想从中提取一些数据。以下是摘录:</p>
<pre><code><program>
<id>38e072a7-8fc9-4f9a-8eac-3957905c0002</id>
<programID>3853</programID>
<orchestra>New York Philharmonic</orchestra>
<season>1842-43</season>
<concertInfo>
<eventType>Subscription Season</eventType>
<Location>Manhattan, NY</Location>
<Venue>Apollo Rooms</Venue>
<Date>1842-12-07T05:00:00Z</Date>
<Time>8:00PM</Time>
</concertInfo>
<worksInfo>
<work ID="52446*">
<composerName>Beethoven, Ludwig van</composerName>
<workTitle>SYMPHONY NO. 5 IN C MINOR, OP.67</workTitle>
<conductorName>Hill, Ureli Corelli</conductorName>
</work>
<work ID="8834*4">
<composerName>Weber, Carl Maria Von</composerName>
<workTitle>OBERON</workTitle>
<movement>"Ozean, du Ungeheuer" (Ocean, thou mighty monster), Reiza (Scene and Aria), Act II</movement>
<conductorName>Timm, Henry C.</conductorName>
<soloists>
<soloist>
<soloistName>Otto, Antoinette</soloistName>
<soloistInstrument>Soprano</soloistInstrument>
<soloistRoles>S</soloistRoles>
</soloist>
</soloists>
</work>
<work ID="3642*">
<composerName>Hummel, Johann</composerName>
<workTitle>QUINTET, PIANO, D MINOR, OP. 74</workTitle>
<soloists>
<soloist>
<soloistName>Scharfenberg, William</soloistName>
<soloistInstrument>Piano</soloistInstrument>
<soloistRoles>A</soloistRoles>
</soloist>
<soloist>
<soloistName>Hill, Ureli Corelli</soloistName>
<soloistInstrument>Violin</soloistInstrument>
<soloistRoles>A</soloistRoles>
</soloist>
<soloist>
<soloistName>Derwort, G. H.</soloistName>
<soloistInstrument>Viola</soloistInstrument>
<soloistRoles>A</soloistRoles>
</soloist>
<soloist>
<soloistName>Boucher, Alfred</soloistName>
<soloistInstrument>Cello</soloistInstrument>
<soloistRoles>A</soloistRoles>
</soloist>
<soloist>
<soloistName>Rosier, F. W.</soloistName>
<soloistInstrument>Contrabass</soloistInstrument>
<soloistRoles>A</soloistRoles>
</soloist>
</soloists>
</work>
<work ID="0*">
<interval>Intermission</interval>
</work>
<work ID="8834*3">
<composerName>Weber, Carl Maria Von</composerName>
<workTitle>OBERON</workTitle>
<movement>Overture</movement>
<conductorName>Etienne, Denis G.</conductorName>
</work>
<work ID="8835*1">
<composerName>Rossini, Gioachino</composerName>
<workTitle>ARMIDA</workTitle>
<movement>Duet</movement>
<conductorName>Timm, Henry C.</conductorName>
<soloists>
<soloist>
<soloistName>Otto, Antoinette</soloistName>
<soloistInstrument>Soprano</soloistInstrument>
<soloistRoles>S</soloistRoles>
</soloist>
<soloist>
<soloistName>Horn, Charles Edward</soloistName>
<soloistInstrument>Tenor</soloistInstrument>
<soloistRoles>S</soloistRoles>
</soloist>
</soloists>
</work>
<work ID="8837*6">
<composerName>Beethoven, Ludwig van</composerName>
<workTitle>FIDELIO, OP. 72</workTitle>
<movement>"In Des Lebens Fruhlingstagen...O spur ich nicht linde," Florestan (aria)</movement>
<conductorName>Timm, Henry C.</conductorName>
<soloists>
<soloist>
<soloistName>Horn, Charles Edward</soloistName>
<soloistInstrument>Tenor</soloistInstrument>
<soloistRoles>S</soloistRoles>
</soloist>
</soloists>
</work>
<work ID="8336*4">
<composerName>Mozart, Wolfgang Amadeus</composerName>
<workTitle>ABDUCTION FROM THE SERAGLIO,THE, K.384</workTitle>
<movement>"Ach Ich liebte," Konstanze (aria)</movement>
<conductorName>Timm, Henry C.</conductorName>
<soloists>
<soloist>
<soloistName>Otto, Antoinette</soloistName>
<soloistInstrument>Soprano</soloistInstrument>
<soloistRoles>S</soloistRoles>
</soloist>
</soloists>
</work>
<work ID="5543*">
<composerName>Kalliwoda, Johann W.</composerName>
<workTitle>OVERTURE NO. 1, D MINOR, OP. 38</workTitle>
<conductorName>Timm, Henry C.</conductorName>
</work>
</worksInfo>
</program>
<program>
</code></pre>
<p>我想做的是提取以下信息:programmid,orchestration,season,eventType,work ID,soloistName,solositInstrument,soloistRole</p>
<p>以下是我使用的代码:</p>
^{pr2}$
<p>当我运行这段代码时,我只得到最后一个soloistName和soloistInstrumet。我想到的结果有点像是对每个程序的重复观察。所以我会有一些类似的东西:</p>
<p>13918年,纽约爱乐乐团,1842-1843年,订阅季,52446*,奥托,安托瓦内特,女高音,南卡罗莱纳</p>
<p>13918,…,3642*,沙芬伯格,威廉,钢琴,A</p>
<p>13918,…,3642*,希尔,乌雷利·科雷利,小提琴,A</p>
<p>以此类推直到最后一个工作ID:</p>
<p>13918,…,8336*4,奥托,安托瓦内特,女高音,S</p>
<p>我得到的只是最后的工作:</p>
<p>13918年,纽约爱乐乐团,1842-1843年,订阅季,8336*,奥托,安托瓦内特,女高音,南斯</p>
<p>在这个文件中有超过15000个程序像我发布的例子。我想解析所有这些代码并提取我上面提到的信息。我不太清楚该怎么做,我已经在互联网上搜索过一种方法,但是我尝试的一切都不管用!!在</p>