替换Pandas系列中包含单词的部分字符串的最快方法 - 问答 - Python中文网

替换Pandas系列中包含单词的部分字符串的最快方法

2024-09-20 05:55:19 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我有一个大的数据集all_transcripts，有将近300万行。其中一列msgText包含书面消息。在

>>> all_transcripts['msgText']

['this is my first message']
['second message is here']
['this is my third message']

此外，我有一个包含200多个单词的列表，名为gemeentes。在

^{pr2}$

如果此列表中的某个词包含在msgText中，我想用另一个词替换它。为此，我创建了一个函数：

def replaceCity(text):
    newText = text.replace(plaatsnaam, 'woonplaats')
    return str(newText)

因此，我的期望输出如下：

['this is my woonplaats message']
['woonplaats message is here']
['this is my woonplaats message']

目前，我正在遍历列表，并对列表中的每个项目应用replaceCity函数。在

for plaatsnaam in gemeentes:
    global(plaatsnaam)
    all_transcripts['filtered_text'] = test.msgText.apply(replaceCity)

但是，这需要很长时间，所以似乎没有效率。有没有更快的方法来完成这个任务？在

这篇文章（Algorithm to find multiple string matches）很相似，但是我的问题不同，因为：

这里只有一大块小文字，而我有一个包含许多不同行的数据集
我想替换单词，而不仅仅是查找单词。

Tags：数据 text message 列表 here is my all

1条回答

网友

1楼 · 发布于 2024-09-20 05:55:19

假设all_transcripts是熊猫DataFrame：

all_transcripts['msgText'].str.replace('|'.join(gemeentes),'woonplaats')

示例：

^{pr2}$

输出

0    this is my woonplaats message
1       woonplaats message is here
2    this is my woonplaats message

相关问题更多 >

编程相关推荐

热门问题

热门文章