擅长:python、mysql、java
<p>如果您有一个非常大的列表和一个大的数据帧要搜索,那么这是一个更快的(运行时方面的)解决方案</p>
<p>我猜这是因为它利用了字典(需要O(N)来构造,需要O(1)来搜索)。就性能而言,正则表达式搜索速度较慢</p>
<pre><code>import pandas as pd
from collections import Counter
def occurrence_counter(target_string, search_list):
data = dict(Counter(target_string.split()))
count = 0
for key in search_list:
if key in data:
count+=data[key]
return count
data = {'speaker':['Adam','Ben','Clair'],
'speech': ['Thank you very much and good afternoon.',
'Let me clarify that because I want to make sure we have got everything right',
'By now you should have some good rest']}
df = pd.DataFrame(data)
wordlist = ['much', 'good','right']
df['speech'].apply(lambda x: occurrence_counter(x, wordlist))
</code></pre>