查询dataframe中展开包含的列中的确切单词

category synonyms_text \ 130 Fishing seafarm, seafood, shellfish, sportfish 141 Refrigeration coldstorage, foodlocker, freeze, fridge, ice, refrigeration 183 Food Service cook, fastfood, foodserve, foodservice, foodtruck, mealprep 200 Restaurant expresso, food, galley, gastropub, grill, java, kitchen 377 fastfood carryout, fastfood, takeout 379 Animal Supplies feed, fodder, grain, hay, petfood 613 store convenience, food, grocer, grocery, market

[['bar', 'bistro', 'breakfast', 'buffet', 'cabaret', 'cafe', 'cantina', 'cappuccino', 'chai', 'coffee', 'commissary', 'cuisine', 'deli', 'dhaba', 'dine', 'diner', 'dining', 'eat', 'eater', 'eats', 'edible', 'espresso', 'expresso', 'food', 'galley', 'gastropub', 'grill', 'java', 'kitchen', 'latte', 'lounge', 'pizza', 'pizzeria', 'pub', 'publichouse', 'restaurant', 'roast', 'sandwich', 'snack', 'snax', 'socialhouse', 'steak', 'sub', 'sushi', 'takeout', 'taphouse', 'taverna', 'tea', 'tiffin', 'trattoria', 'treat', 'treatery'], ['convenience', 'food', 'grocer', 'grocery', 'market', 'mart', 'shop', 'store', 'variety']]

1条回答

网友

1楼 · 发布于 2024-09-29 20:23:55

数据帧示例：

df = pd.DataFrame({'category':['Fishing','Refrigeration','store'],
                   'synonyms_text':['seafood','foodlocker','food']})

print(df)
        category synonyms_text
0        Fishing       seafood
1  Refrigeration    foodlocker
2          store          food # <  we want only the rows with exact "food"

我们有三种方法可以做到这一点：

str.match
str.contains
str.extract（这里不是很有用）

# 1
df['synonyms_text'].str.match(r'\bfood\b')

# 2 
df['synonyms_text'].str.match(r'\bfood\b')

# 3
df['synonyms_text'].str.extract(r'(\bfood\b)').eq('food')

输出

0    False
1    False
2     True
Name: synonyms_text, dtype: bool

最后我们使用boolean序列过滤掉数据帧.loc

m = df['synonyms_text'].str.match(r'\bfood\b')
df.loc[m]

输出

  category synonyms_text
2    store          food

奖金：

要匹配不区分大小写请使用?i：

例如：

df['synonyms_text'].str.match(r'\b(?i)food\b')

哪个匹配：food，Food，FOOD，fOoD

相关问题更多 >

编程相关推荐

热门问题

热门文章