对于pandas，如何在定义组名时按相似和非相似的部分列值进行分组

Who Amount 0 DE BORTOLI WINES DIXONS CREEK -29.54 1 DE BORTOLI WINES RE DIXONS CREEK -20.50 2 DE BORTOLI WINES P/L DIXONS CREEK -22.50 3 DE BORTOLI WINES PTY L BILBUL -91.00 4 Ezard@Levantine Hill Coldstream -31.30 5 Ezard@LevantineHill Coldstream -21.10 6 RATHBONE WINE GROUP PORT MELBOURN -20.20 7 YERING STATION YARRA GLEN -17.05 8 YERING STATION YARRA GREEN -31.00 columns: Index(['Who', 'Amount'], dtype='object')

1条回答

网友

1楼 · 发布于 2024-10-03 11:15:15

一种方法是创建一个包含查找键的字典，如DE BORTOLI，并定义统一值，如DE BORTOLI WINES DIXONS CREEK。然后，您可以为此分层名称创建一个临时列，这样列Who中的信息不会丢失，并按新列分组：

#create dictionary how to translate patterns into categories
#the pattern .* is used as a regex pattern meaning any character before or after this string allowed
transl = {".*DE BORTOLI.*": "DE BORTOLI WINES          DIXONS CREEK", ".*Ezard@.*": "Ezard@Levantine Hill      Coldstream", ".*RATHBONE.*": "RATHBONE AND YERING", ".*YERING.*":"RATHBONE AND YERING"}
#create column with stratified name for grouping
df["strat"] = df["Who"].replace(to_replace = transl, regex = True)
#group and sum
group_df = df.groupby("strat", as_index = False).sum()
print(group_df)

样本输出

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章