在Python中使用正则表达式来删除XML中的空白行？

<entry> <id>http://feeds.rasset.ie/rteavgen/player/videos/show/?id=10103822</id> <showid>10103822</showid> <platform>iptv</platform> <published>2013-01-19T21:45:00+00:00</published> <updated>2013-01-19T23:41:00+00:00</updated> <title type="text">The Saturday Night Show</title> <content type="text">Chat show, presented by journalist and broadcaster Brendan O'Connor, featuring comedy, celebrity guests and live musical performances.</content> <category term="RTÉ One" rte:type="channel"/> <category term="Entertainment" rte:type="genre"/> <category term="None" rte:type="series"/> <category term="None" rte:type="episode"/> <category term="None" rte:type="ranking"/> <category term="1024" rte:type="genrelist"/> <category term="None" rte:type="keywordlist"/> <category term="1668" rte:type="progid"/> <link rel="self" type="application/atom+xml" href="http://feeds.rasset.ie/rteavgen/player/playlist?showId=10103822"/> <link rel="alternate" type="text/html" href="http://www.rte.ie/player/#v=10103822"/> <rte:valid start="2013-01-19T21:52:12+00:00" end="2013-02-09T21:52:12+00:00"/> <rte:duration ms="4201061" formatted="1:10"/> <rte:statistics views="194"/> <media:title type="plain">The Saturday Night Show</media:title> <media:description type="plain">Chat show, presented by journalist and broadcaster Brendan O'Connor, featuring comedy, celebrity guests and live musical performances.</media:description> <media:player url="http://feeds.rasset.ie/rteavgen/player/player/?id=" width="400" height="300"/> <media:thumbnail url="http://img.rasset.ie/0006e56a.jpg" time="00:00:00+00:00"/> <media:restriction relationship="allow" type="country"/> <media:restriction relationship="disallow" type="country"/> <media:copyright>RTÉ</media:copyright> </entry>

2条回答

网友

1楼 · 编辑于 2024-10-03 02:36:58

要删除空行，不需要regex：

with open("my_file.xml") as xmlfile:
    lines = [line for line in xmlfile if line.strip() is not ""]

with open("my_file.xml", "w") as xmlfile:
    xmlfile.writelines(lines)

同样要解析xml文件，您可以简单地使用expat:http://docs.python.org/2/library/pyexpat.html或者甚至可能使用mini-dom:http://docs.python.org/2/library/xml.dom.minidom.html另一个非常好的方法是ElementTree:http://docs.python.org/2/library/xml.etree.elementtree.html

然而，regex并不推荐这样做，实际上这是个坏主意。在

网友

2楼 · 编辑于 2024-10-03 02:36:58

您不应该像其他人所说的那样在这个任务中使用regex。在

回答你的实际问题：你对元素之间的空白太过具体了。在这种情况下，额外的空白会给你带来麻烦。很容易就没有空格：

<category term="None" rte:type="ranking"/><category term="1024" rte:type="genrelist"/>

补救方法：不要使用\n后跟8个空格，而是使用\s*（零个或多个空格字符）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章