获取特定模式前后的完整字符串

2024-10-03 04:37:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我想抓取噪声文本,其中有一个特定的模式:

text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"

我想把这句话里的所有东西都去掉,在一个空格之后,在一个空格包含之前&;@。你知道吗

result = "this is some text and some more text and some other stuff"

一直在尝试:

re.compile(r'([\s]&@.*?([\s])).sub(" ", text)

不过,我好像没法理解第一部分。你知道吗


Tags: andtext文本ismore模式somethis
3条回答

试试这个:

import re
result = re.findall(r"[a-zA-z]+\&\@[a-zA-z]+", text) 
print(result)
['lskdfmd&@kjansdl', 'sldkf&@lsakjd']

现在从所有单词的列表中删除result列表。你知道吗

编辑1

re.sub(r"[a-zA-z]+\&\@[a-zA-z]+", '', text)
output: 'this is some text  and some more text  and some other stuff'

编辑2建议@Pushpesh Kumar Rajwanshi

re.sub(r" [a-zA-z]+\&\@[a-zA-z]+ ", " ", text)
output:'this is some text and some more text and some other stuff'

你可以用这个正则表达式来捕捉噪音串

\s+\S*&@\S*\s+

用一个空格代替它。你知道吗

这里,\s+匹配任何空格,然后\S*匹配零个或多个非空格字符,同时将&@夹在其中,然后\S*匹配零个或多个空格,最后紧跟着\s+一个或多个空格,这些空格被空格删除,从而得到您想要的字符串。你知道吗

另外,如果这个噪声字符串可以位于字符串的最开始或最末尾,可以随意将\s+更改为\s*

Regex Demo

Python代码

import re

s = 'this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff'
print(re.sub(r'\s+\S*&@\S*\s+', ' ', s))

指纹

this is some text and some more text and some other stuff

你可以用

\S+&@\S+\s*

a demo on regex101.com


Python中:
import re
text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"
rx = re.compile(r'\S+&@\S+\s*')
text = rx.sub('', text)
print(text)

这就产生了

this is some text and some more text and some other stuff

相关问题 更多 >