Python搜索列表中单词的数据帧，并跟踪找到的单词和频率

a b c d 0 123 'Blah Blah Steel' 'STEEL' 1 1 789 'Blah Blah Steel Gold' 'STEEL','GOLD' 2 2 789 'Blah Blah Gold' 'GOLD' 1 3 790 'Blah Blah blah'

2条回答

网友

1楼 · 编辑于 2024-09-24 04:30:25

您可以使用re.findall()而不是extract（）来执行所需的操作

import re

search_list = ['STEEL','IRON','GOLD','SILVER']

df['c'] = df.b.str.findall('({0})'.format('|'.join(search_list)), flags=re.IGNORECASE)
df['d'] = df['c'].str.len()

此输出如下所示：

网友

2楼 · 编辑于 2024-09-24 04:30:25

#turn column b into a list of uppercases
  df.b=df.b.str.upper().str.split('\s')

#Because you have two lists, use the apply function to turn them into sets
#..and leverage the rich membership functions encased in sets.
# Using intersection, you will find items in each list. 
#Then use list.str.len() to count.

df=df.assign(c=df.b.apply(lambda x:[*{*x}&{*search_list}])\
.str.join(','),d=df.b.apply(lambda \
x:[*{*x}&{*search_list}]).str.len())



                       b           c      d
0        [BLAH, BLAH, STEEL]       STEEL  1
1  [BLAH, BLAH, STEEL, GOLD]  GOLD,STEEL  2
2         [BLAH, BLAH, GOLD]        GOLD  1
3         [BLAH, BLAH, BLAH]              0

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python搜索列表中单词的数据帧，并跟踪找到的单词和频率

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >