我有一个现有的数据帧(coffee\u directions\u df),如下所示
coffee_directions_df
Utterance Frequency
Directions to Starbucks 1045
Directions to Tullys 1034
Give me directions to Tullys 986
Directions to Seattles Best 875
Show me directions to Dunkin 812
Directions to Daily Dozen 789
Show me directions to Starbucks 754
Give me directions to Dunkin 612
Navigate me to Seattles Best 498
Display navigation to Starbucks 376
Direct me to Starbucks 201
DF显示了人们的话语和话语的频率。你知道吗
也就是说,“星巴克方向”被说了1045次。你知道吗
我正在试图找出如何将类似的单词,如coffee_directions_df.Utterance
列中的“Starbucks”、“Tullys”、“Seattles Best”替换为一个字符串,如“Coffee”。我见过类似的答案,建议一本字典,如以下,但我还没有成功。你知道吗
{'Utterance':['Starbucks','Tullys','Seattles Best'],
'Combi_Utterance':['Coffee','Coffee','Coffee','Coffee']}
{'Utterance':['Dunkin','Daily Dozen'],
'Combi_Utterance':['Donut','Donut']}
{'Utterance':['Give me','Show me','Navigate me','Direct me'],
'Combi_Utterance':['V_me','V_me','V_me','V_me']}
所需输出如下:
coffee_directions_df
Utterance Frequency Combi_Utterance
Directions to Starbucks 1045 Directions to Coffee
Directions to Tullys 1034 Directions to Coffee
Give me directions to Tullys 986 V_me to Coffee
Directions to Seattles Best 875 Directions to Coffee
Show me directions to Dunkin 812 V_me to Donut
Directions to Daily Dozen 789 Directions to Donut
Show me directions to Starbucks 754 V_me to Coffee
Give me directions to Dunkin 612 V_me to Donut
Navigate me to Seattles Best 498 V_me to Coffee
Display navigation to Starbucks 376 Display navigation to Coffee
Direct me to Starbucks 201 V_me to Coffee
最终,我希望能够使用这个代码来生成最终的输出。你知道吗
df = (df.set_index('Frequency')['Utterance']
.str.split(expand=True)
.stack()
.reset_index(name='Words')
.groupby('Words', as_index=False)['Frequency'].sum()
)
print (df)
Words Frequency
0 Directions 6907
1 V_me 3863
2 Donut 2213
3 Coffee 5769
4 Other 376
谢谢!!你知道吗
下面是一种方法。根据前面的问题,我选择使用
collections.Counter
而不是pandas
作为计数逻辑。你知道吗所需的输入是映射字典
rep_dict
的形式。我们将此应用于df['Utterance']
序列中字符串的子字符串。你知道吗结果
相关问题 更多 >
编程相关推荐