将列替换为基于datafram中频率计数的最大字符串长度

2024-06-13 10:31:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两列:freqnewname。我想用基于freq的最大字符串长度单词替换newname。 我试过的代码:

    k = df['Newname'].to_list()
    j = list(set(k))
    for row in df.iterrows():
        print(row)
        if row==j[0]:
            df.at[row.Index, 'Newname'] = df['Newname'].value_counts().argmax()
    elif row==j[1]:
        df.at[row.Index, 'Newname'] = df['Newname'].value_counts().argmax()

输入:

 newname  freq
 ASHOK    5
 aSHOK    5
 Ashok    5
 A        5
 Ask      5
 ajay     4
 Ajay     4
 A        4
 Aja      4

预期产量:

newname    freq
Ashok      5
Ashok      5
Ashok      5
Ashok      5
Ashok      5
Ajay       4
Ajay       4
Ajay       4
Ajay       4

Tags: 字符串dfindexvalue单词atlistrow
2条回答

希望有帮助!你知道吗

#Get name lengths
df['name_len'] = df['name'].apply(lambda x : len(x))

#Get variables
max_freq = df['freq'].max()
max_len = df['name_len'].max()

#Apply Filters
filter1 = df[df['name_len'] == max_len].reset_index(drop=True)
filter2 = filter1[filter1['freq'] == max_freq].reset_index(drop=True)

#get the target_name
target_name = np.array(filter2.name[0].capitalize())
new_names = target_name.repeat(df.shape[0])

#create new_names
df['new_name'] = new_names
df = df.drop(['name_len', 'name'], axis = 1)

注: 当你有相同的频率和名称长度时,你必须给名称的频率或长度赋予权重。你知道吗

Before

After

df2 = pd.DataFrame(columns=['new_name', 'freq'])
for name,group in df.groupby(['freq']):
    #Get length of the names and max len
    group['name_len'] = group['name'].apply(lambda x : len(x))
    max_len = group['name_len'].max()

    #Apply Filters
    filter1 = group[group['name_len'] == max_len].reset_index(drop=True)

    #get the target_name
    target_name = np.array(filter1.name[0].capitalize())
    new_names = target_name.repeat(group.shape[0])

    #create new_names
    group['new_name'] = new_names
    group = group.drop(['name_len', 'name'], axis = 1)
    print(group)
    df2 = pd.concat([df2, group]).sort_index(axis=0)

相关问题 更多 >