如何计算Pandas系列中的特定单词？

df['totalwords'] = df.review.str.split() df['word_count'] = df.totalwords.apply(word_counter) ---------------------------------------------------------------------------- ----> 1 df['word_count'] = df.totalwords.apply(word_counter) c:\users\admin\appdata\local\programs\python\python36\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds) 3192 else: 3193 values = self.astype(object).values -> 3194 mapped = lib.map_infer(values, f, convert=convert_dtype) 3195 3196 if len(mapped) and isinstance(mapped[0], Series): pandas/_libs/src\inference.pyx in pandas._libs.lib.map_infer() <ipython-input-51-cd11c5eb1f40> in word_counter(sent) 2 a={} 3 for word in selected_words: ----> 4 a[word] = sent.count(word) 5 return a AttributeError: 'float' object has no attribute 'count'

3条回答

网友

1楼 · 编辑于 2024-07-08 08:21:04

在循环中重复list.count将在值为list的情况下工作，尽管效率低下。复杂性为O（mxn），其中m是选定值的数目，n是值的总数。在

对于Pandas，您可以使用优化的方法来确保O（n）的复杂性。在这种情况下，可以使用^{}后跟^{}：

res = df['A'].value_counts().reindex(selected_words)

print(res)

awesome      1
great        2
fantastic    2
Name: A, dtype: int64

或者，按照@pyd's solution，先过滤，然后使用value_counts。两种解决方案都将具有O（n）的复杂性。在

网友

2楼 · 编辑于 2024-07-08 08:21:04

假设你的数据帧是这样的

df=pd.DataFrame({'A': ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate','great', 'fantastic', 'amazing', 'love', 'horrible']})
print(df)
    A
0   awesome
1   great
2   fantastic
3   amazing
4   love
5   horrible
6   bad
7   terrible
8   awful
9   wow
10  hate
11  great
12  fantastic
13  amazing
14  love
15  horrible

selected_words=['awesome','great','fantastic']

df.loc[df['A'].isin(selected_words),'A'].value_counts()
[out]
great        2
fantastic    2
awesome      1
Name: A, dtype: int64

网友

3楼 · 编辑于 2024-07-08 08:21:04

在你的问题中，你似乎是在为伯爵执行一个命令。@pyd已经发布了一个很好的计数解决方案。生成的结果不是dict。如果您正在寻找dictionary作为输出，请查看下面发布的代码，它基本上是pyd提供的解决方案的扩展。在

df=pd.DataFrame({'A': ['awesome', 'great', 'fantastic', 'amazing', 'love', 'horrible', 'bad', 'terrible', 'awful', 'wow', 'hate','great', 'fantastic', 'amazing', 'love', 'horrible']})

def get_count_dict(data, selected_words):

    count_dict = {}

    counts = data.loc[data['A'].isin(selected_words), 'A'].value_counts()

    for i in range(len(counts.index.tolist())):
        count_dict[counts.index.tolist()[i]] = counts[i]

    return count_dict

selected_words=['awesome','great','fantastic']

get_count_dict(df, selected_words)

Output : {'fantastic': 2, 'great': 2, 'awesome': 1}

相关问题更多 >

编程相关推荐

热门问题

热门文章