Python regex删除字符串中的url和域名

2条回答

网友

1楼 · 编辑于 2024-10-04 05:34:07

你应该逃出所有这些点，或者更好的是，把这个点移到组外，然后逃逸一次，你也可以从not space直到not space，像这样：

re.sub(r'[\S]+\.(net|com|org|info|edu|gov|uk|de|ca|jp|fr|au|us|ru|ch|it|nel|se|no|es|mil)[\S]*\s?','',string)

以下内容：
'this is my content domain.com more content http://domain2.org/content and more content domain.net/page thingynet stuffocom'
变成：

^{pr2}$

网友

2楼 · 编辑于 2024-10-04 05:34:07

这是另一种解决方案：

import re
f = open('test.txt', 'r')
content = f.read()
pattern = r"[^\s]*\.(com|org|net)\S*"
result = re.sub(pattern, '', content)
print(result)

输入：

^{pr2}$

输出：

this is my content  more content  and more content  and