Pandas数据帧中每个单词的频率

key message Final Category 0 1 I have not received my gifts which I ordered ok voucher 1 2 hth her wells idyll McGill kooky bbc.co noclass 2 3 test test test 1 test noclass 3 4 test noclass 4 5 hello where is my reward points other 5 6 hi, can you get koovs coupons or vouchers here options 6 7 Hi Hey when you people will include amazon an options

3条回答

网友

1楼 · 编辑于 2024-09-28 03:12:48

可能有更雄辩的方法来实现这一点，但这里有一堆嵌套的for循环：

final_cat_list = df['Final Category'].unique()

word_count = {}
for f in final_cat_list:
    word_count[f] = {}
    message_list = list(df.loc[df['Final Category'] == f, 'key message'])
    for m in message_list:
        word_list = m.split(" ")
        for w in word_list:
            if w in word_count[f]:
                word_count[f][w] += 1
            else:
                word_count[f][w] = 1

网友

2楼 · 编辑于 2024-09-28 03:12:48

import pandas as pd 
import numpy as np

# copy/paste data (you can skip this since you already have a dataframe)
dict = {0 : {'key': 1 , 'message': "I have not received my gifts which I ordered ok",     'Final Category': 'voucher'},
        1 : {'key': 2 , 'message': "hth her wells idyll McGill kooky bbc.co",             'Final Category': 'noclass'},
        2 : {'key': 3 , 'message': "test test test 1 test",                               'Final Category': 'noclass'},
        3 : {'key': 4 , 'message': "test",                                                'Final Category': 'noclass'},
        4 : {'key': 5 , 'message': "hello where is my reward points",                   'Final Category': 'other'},
        5 : {'key': 6 , 'message': "hi, can you get koovs coupons or vouchers here",      'Final Category': 'options'},
        6 : {'key': 7 , 'message': "Hi Hey when you people will include amazon an",       'Final Category': 'options'}
        }

# make DataFrame (you already have one)
df = pd.DataFrame(dict).T

# break up text into words, combine by 'Final' in my case
df.message = df.message.str.split(' ')
final_df = df.groupby('Final Category').agg(np.sum)

# make final dictionary
final_dict = {}
for label,text in zip(final_df.index, final_df.message):  
    final_dict[label] = {w: text.count(w) for w in text}

网友

3楼 · 编辑于 2024-09-28 03:12:48

这会修改原始df，所以您可能需要先复制它

from collections import Counter
df["message"] = df["message"].apply(lambda message: message + " ")
df.groupby(["Final Category"]).sum().applymap(lambda message: Counter(message.split()))

此代码的作用：首先，它在所有消息的末尾添加一个空格。这个稍后再来。然后按最后一个类别进行分组，并对每组中的消息进行汇总。这就是尾随空格很重要的地方，否则消息的最后一个单词将粘在下一个单词的第一个单词上。（求和是字符串的串联）

然后将字符串沿空格分割得到单词，然后进行计数。在

相关问题更多 >

编程相关推荐

热门问题

热门文章