如何匹配Pandas系列中文本列中的单词或字符?

2024-05-05 02:28:48 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有这些词,我想在一个句子中查找所有这三个关键词

keywords_to_track = ["crypto exchange", "loses", "$"] 
# here $ is character because it could appear like "30m$"


0       The $600M Crypto Heist (And How It Impacts ...
1             What Is a Decentralized Crypto Exchange?
2    Crypto Breaches And Fraud Increasing 41% Every...
3            Crypto Exchange Binance Loses $21M in Hack
4    Cryptocurrency hacks and fraud are on track fo...
Name: title, dtype: object

如果你看,第三个索引在一个句子中包含了所有这些单词,我需要跟踪。我想要的输出是

0       False
1       False
2       False
3       True
4       False
Name: title, dtype: bool

我试过了,但我不想要或更确切地说,我不认为我的尝试是正确的

dataframe.title.str.lower().str.match("crypto exchange|loses|$")

1条回答
网友
1楼 · 发布于 2024-05-05 02:28:48

更改keywords_to_track如下所示:

# add \ before $
keywords_to_track = ["crypto exchange", "loses", "\$"]

现在使用str.findall

words = fr"({'|'.join(keywords_to_track)})"

df['match_all'] = df['title'].str.lower() \
                             .str.findall(words) \
                             .apply(lambda x: len(set(x)) == len(keywords_to_track))

输出:

>>> df
                                               title  match_all
0     The $600M Crypto Heist (And How It Impacts ...      False
1           What Is a Decentralized Crypto Exchange?      False
2  Crypto Breaches And Fraud Increasing 41% Every...      False
3         Crypto Exchange Binance Loses $21M in Hack       True
4  Cryptocurrency hacks and fraud are on track fo...      False
5                    Crypto exchange Crypto exchange      False

相关问题 更多 >