为正确的嵌套XML值提取正则表达式

2024-07-02 12:46:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图为下面的XML代码提取英文名称。我需要为language="eng"而不是language="chi"提取名称值。你知道吗

我可以知道什么是Python正则表达式可以帮助我实现它吗?你知道吗

<?xml version="1.0" encoding="UTF-8"?>
 <BroadcastData creationDate="20150814232141">
     <ProviderInfo>
         <ProviderId>Profis</ProviderId>
         <ProviderName>ProfisLynx.</ProviderName>
     </ProviderInfo>
     <ScheduleData>
         <ChannelPeriod endTime="20150814233000" beginTime="20150814220000">
             <ChannelId>88</ChannelId>
             <Event duration="1800" beginTime="20150814220000">
                 <EventId>GR0018904021</EventId>
                 <DvbEventId>45481</DvbEventId>
                 <EventType>S</EventType>
                 <PreviewTime>0</PreviewTime>
                 <EpgProduction>
                     <EpgText language="eng">
                         <Name>Across The Strait</Name>
                         <Description>This programme looks at the happenings in Taiwan and its relationship with China. There'll be updated news on Taiwan and in-depth reports and discussions about current affairs issues in Taiwan.</Description>
                         <ExtendedInfo name="Contentid_ref">GR0018904021</ExtendedInfo>
                         <ExtendedInfo name="AudioTrack">chi</ExtendedInfo>
                         <ExtendedInfo name="Start_over_flag">0</ExtendedInfo>
                         <ExtendedInfo name="ProgrammeStatus">L</ExtendedInfo>
                     </EpgText>
                     <EpgText language="chi">
                         <Name>海峡两岸</Name>
                         <Description>丬央电视å°å”¯ä¸€çš„涉å°æ—¶äº‹æ–°é—»è¯„论节目。节目宗旨是跟踪海峡烬点,åæ˜ ä¸¤å²¸æ°‘æ„,报导当日的近期å°æ¹¾å²›å†…的烬点新闻,并对两岸å„个层é¢çš„交æµäº¤å¾€è¿›è¡Œè·Ÿè¸ªæŠ¥é“。</Description>
                         <ExtendedInfo name="AudioTrack">chi</ExtendedInfo>
                         <ExtendedInfo name="ProgrammeStatus">L</ExtendedInfo>
                     </EpgText>
                     <ParentalRating>0</ParentalRating>
                     <DvbContent>
                         <Content nibble2="0" nibble1="0"/>
                         <User nibble2="A" nibble1="0"/>
                     </DvbContent>
                     <DvbContent>
                         <Content nibble2="0" nibble1="0"/>
                         <User nibble2="0" nibble1="8"/>
                     </DvbContent>
                 </EpgProduction>
             </Event>
 ==============================================================
             <Event duration="1800" beginTime="20150814223000">
                 <EventId>GR0018906021</EventId>
                 <DvbEventId>45482</DvbEventId>
                 <EventType>S</EventType>
                 <PreviewTime>0</PreviewTime>
                 <EpgProduction>
                     <EpgText language="eng">
                         <Name>Asia Today</Name>
                         <Description>Tune in daily to receive the important news and latest social changes happening in Asia.</Description>
                         <ExtendedInfo name="Contentid_ref">GR0018906021</ExtendedInfo>
                         <ExtendedInfo name="AudioTrack">chi</ExtendedInfo>
                         <ExtendedInfo name="Start_over_flag">0</ExtendedInfo>
                         <ExtendedInfo name="ProgrammeStatus">L</ExtendedInfo>
                     </EpgText>
                     <EpgText language="chi">
                         <Name>今日亚洲</Name>
                         <Description>节目以亚洲人的视角报é“亚洲ã€ä¼ 达亚洲人的声音ã€å±•çŽ°äºšæ´²çš„è¿›æ-¥å’Œå‘展,以åŠåæ˜ äºšæ´²ä¸Žä¸–ç•Œå…¶ä»–åœ°åŒºçš„äº’åŠ¨ã€‚</Description>
                         <ExtendedInfo name="AudioTrack">chi</ExtendedInfo>
                         <ExtendedInfo name="ProgrammeStatus">L</ExtendedInfo>
                     </EpgText>
                     <ParentalRating>0</ParentalRating>
                     <DvbContent>
                         <Content nibble2="0" nibble1="0"/>
                         <User nibble2="A" nibble1="0"/>
                     </DvbContent>
                     <DvbContent>
                         <Content nibble2="0" nibble1="0"/>
                         <User nibble2="0" nibble1="8"/>
                     </DvbContent>
                 </EpgProduction>
             </Event>
 ==============================================================
             <Event duration="1800" beginTime="20150814230000">
                 <EventId>GR0018908021</EventId>
                 <DvbEventId>45483</DvbEventId>
                 <EventType>S</EventType>
                 <PreviewTime>0</PreviewTime>
                 <EpgProduction>
                     <EpgText language="eng">
                         <Name>China News</Name>
                         <Description>A news programme made especially to cater to the needs of overseas Chinese and potential investors. The content include China and international news and news analysis.</Description>
                         <ExtendedInfo name="Contentid_ref">GR0018908021</ExtendedInfo>
                         <ExtendedInfo name="AudioTrack">chi</ExtendedInfo>
                         <ExtendedInfo name="Start_over_flag">0</ExtendedInfo>
                         <ExtendedInfo name="ProgrammeStatus">L</ExtendedInfo>
                     </EpgText>
                     <EpgText language="chi">
                         <Name>丬国新闻</Name>
                         <Description>《丬国新闻》是以海外åŽäººã€æ¸¯æ¾³å°åŒèƒžã€ç•™å¬¦ç”Ÿã€é©»å¤–使领馆åŠä¸-èµ„æœºæž„äººå‘˜ä¸ºç›®æ ‡çš„æ–°é—»èŠ‚ç›®ã€‚èŠ‚ç›®ç”±å›½å†…å¤–è¦é—»ã€å†…地ç»æµŽå’Œç¤¾ä¼šæ–°é—»ã€å¯¹å›½å†…外é‡è¦æ–°é—»äº‹ä»¶çš„分æžç»„æˆã€‚</Description>
                         <ExtendedInfo name="AudioTrack">chi</ExtendedInfo>
                         <ExtendedInfo name="ProgrammeStatus">L</ExtendedInfo>
                     </EpgText>
                     <ParentalRating>0</ParentalRating>
                     <DvbContent>
                         <Content nibble2="0" nibble1="0"/>
                         <User nibble2="A" nibble1="0"/>
                     </DvbContent>
                     <DvbContent>
                         <Content nibble2="0" nibble1="0"/>
                         <User nibble2="0" nibble1="8"/>
                     </DvbContent>
                 </EpgProduction>
             </Event>
 ==============================================================
         </ChannelPeriod>
     </ScheduleData>
 </BroadcastData>
==================================================================================================================

Tags: andnameeventdescriptionlanguagechieventideventtype
2条回答

最好不要用RegEx解析XML,以避免意外的结果。你知道吗

试试这个How do I parse XML in Python?

如果text包含您提供的XML,那么下面的正则表达式可以工作:

print re.findall(r'<EpgText\s+language="eng">\s*<Name>(.*?)</Name>', text, re.M+re.I)

这将显示以下三个结果:

['Across The Strait', 'Asia Today', 'China News']

不过,使用XML库解析XML要安全得多。你知道吗

相关问题 更多 >