删除python中的某些文本模式

2024-09-28 23:42:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试去除.txt文件中的某个文本模式,该模式类似于:


mystring = '''

example deletion words
in the first block

First sentence to keep.

example deletion words
in the second block

Second sentence to keep.

example deletion words
in the third block

Third sentence to keep.

example deletion words
in the fourth block'''

我期望的输出如下所示:


“要记住的第一句话

保留第二句话

保留第三句。”


所以我要做的是去掉字符串“example”和“block”之间的所有文本,包括字符串本身。你知道我将如何在R或Python中实现这一点吗


对不起,我忘了把我对正则表达式的尝试包括在内,只是突然问了一下,感谢那些不顾一切地努力回答问题的人。我使用正则表达式和python重新打包的工作解决方案:

import re

cleanedtext = re.sub('\nexample.*?block','',mystring, flags=re.DOTALL)

print(cleanedtext)

Tags: theto字符串in文本reexample模式
2条回答

在R中,可以使用来自stringrstr_remove_all

stringr::str_remove_all(string, "example.*block")
 #[1] " First sentence to keep.\nSecond sentence to keep.\nThird sentence to keep.\n"

这是

stringr::str_replace_all(string, "example.*block", "")

数据

string <- "example deletion words in the first block First sentence to keep.
           example deletion words in the second blockSecond sentence to keep.
           example deletion words in the third blockThird sentence to keep.
           example deletion words in the fourth block"

你是否已经提前知道了模式,或者模式是否曾经改变过?如果没有,那么你可以阅读文本文件,一行一行地,把句子分割成易于操作的部分,然后寻找模式。对于没有该字符串的行,可以将其连接到新字符串。我在下面所做的似乎奏效了:

f = open("mytext.txt", "r")
final = ""
for line in f:
    words = line.split(" ")
    if(words[0] == "example" or words[len(words) - 1] == "block\n"):
        continue
    else:
        final = final + line
print(final)

我得到的结果是:

First sentence to keep.


Second sentence to keep.


Third sentence to keep.

相关问题 更多 >