替换类似类别列中的字符串，映射到python中的新列

coffee_directions_df Utterance Frequency Directions to Starbucks 1045 Directions to Tullys 1034 Give me directions to Tullys 986 Directions to Seattles Best 875 Show me directions to Dunkin 812 Directions to Daily Dozen 789 Show me directions to Starbucks 754 Give me directions to Dunkin 612 Navigate me to Seattles Best 498 Display navigation to Starbucks 376 Direct me to Starbucks 201

{'Utterance':['Starbucks','Tullys','Seattles Best'], 'Combi_Utterance':['Coffee','Coffee','Coffee','Coffee']} {'Utterance':['Dunkin','Daily Dozen'], 'Combi_Utterance':['Donut','Donut']} {'Utterance':['Give me','Show me','Navigate me','Direct me'], 'Combi_Utterance':['V_me','V_me','V_me','V_me']}

coffee_directions_df Utterance Frequency Combi_Utterance Directions to Starbucks 1045 Directions to Coffee Directions to Tullys 1034 Directions to Coffee Give me directions to Tullys 986 V_me to Coffee Directions to Seattles Best 875 Directions to Coffee Show me directions to Dunkin 812 V_me to Donut Directions to Daily Dozen 789 Directions to Donut Show me directions to Starbucks 754 V_me to Coffee Give me directions to Dunkin 612 V_me to Donut Navigate me to Seattles Best 498 V_me to Coffee Display navigation to Starbucks 376 Display navigation to Coffee Direct me to Starbucks 201 V_me to Coffee

df = (df.set_index('Frequency')['Utterance'] .str.split(expand=True) .stack() .reset_index(name='Words') .groupby('Words', as_index=False)['Frequency'].sum() ) print (df) Words Frequency 0 Directions 6907 1 V_me 3863 2 Donut 2213 3 Coffee 5769 4 Other 376

1条回答

网友

1楼 · 发布于 2024-10-01 02:33:27

下面是一种方法。根据前面的问题，我选择使用collections.Counter而不是pandas作为计数逻辑。你知道吗

所需的输入是映射字典rep_dict的形式。我们将此应用于df['Utterance']序列中字符串的子字符串。你知道吗

from collections import Counter
import pandas as pd

df = pd.DataFrame([['Directions to Starbucks', 1045],
                   ['Show me directions to Starbucks', 754],
                   ['Give me directions to Starbucks', 612],
                   ['Navigate me to Starbucks', 498],
                   ['Display navigation to Starbucks', 376],
                   ['Direct me to Starbucks', 201],
                   ['Navigate to Starbucks', 180]],
                  columns=['Utterance', 'Frequency'])

# define dictionary of mappings
rep_dict = {'Starbucks': 'Coffee', 'Tullys': 'Coffee', 'Seattles Best': 'Coffee'}

# apply substring mapping
df['Utterance'] = df['Utterance'].replace(rep_dict, regex=True).str.lower()

# previous logic below
c = Counter()

for row in df.itertuples():
    for i in row[1].split():
        c[i] += row[2]

res = pd.DataFrame.from_dict(c, orient='index')\
                  .rename(columns={0: 'Count'})\
                  .sort_values('Count', ascending=False)

def add_combinations(df, lst):
    for i in lst:
        words = '_'.join(i)
        df.loc[words] = df.loc[df.index.isin(i), 'Count'].sum()
    return df.sort_values('Count', ascending=False)

lst = [('give', 'show', 'navigate', 'direct')]

res = add_combinations(res, lst)

结果

                           Count
to                          3666
coffee                      3666
directions                  2411
give_show_navigate_direct   2245
me                          2065
show                         754
navigate                     678
give                         612
display                      376
navigation                   376
direct                       201

相关问题更多 >

编程相关推荐

热门问题

热门文章