使用regex从给定数据中删除几行

2024-10-01 11:39:55 发布

您现在位置:Python中文网/ 问答频道 /正文

使用带有特定模式的正则表达式从给定数据中删除几行

数据:

'''And what struck me was every place that I went to to see these telescopes, the astronomers and cosmologists are in search of a certain kind of silence, whether it's silence from radio pollution or light pollution or whatever.
And it was very obvious that, if we destroy these silent places on Earth, we will be stuck on a planet without the ability to look outwards, because we will not be able to understand the signals that come from outer space.
Thank you.
<talkid>1129</talkid>
<title>Anil Ananthaswamy: What it takes to do extreme astrophysics</title>
<description>All over the planet, giant telescopes and detectors are looking for clues to the workings of the universe. At the INK Conference, science writer Anil Ananthaswamy tours us around these amazing installations, taking us to some of the most remote and silent places on Earth.</description>
<keywords>exploration,journalism,science,technology,universe</keywords>
<url>http://www.ted.com/talks/brewster_kahle_builds_a_free_digital_library.html</url>
We really need to put the best we have to offer within reach of our children.
If we don't do that, we're going to get the generation we deserve.
They're going to learn from whatever it is they have around them.'''

这里我想删除从<talkid></url>的行

这里怎么用regex

已尝试:

re.sub('<.*?>', '', data)

Tags: andoftheto数据fromreurl
2条回答

我不知道您为什么要使用regex,但如果是这样-这就可以了:

rgx = re.compile(r'<talkid>.*</url>\n', re.DOTALL)
print(rgx.sub('', data))

正则表达式'<.*?>'的问题是.特殊字符在默认情况下与换行符不匹配。使用re.DOTALL标志编译正则表达式以更改此默认行为并跨多行匹配字符串

pattern = re.compile('<talkid>.*</url>', re.DOTALL)
new_text = re.sub(pattern, '', text)

相关问题 更多 >