如何在python中连接具有相同列值的行？

data = pd.DataFrame({'Tweet_ID': ['1', '2', '3', '4', '5', '6', '7', '8', '9'], 'Reply_to_tweet_ID': [None, '1', '1', '3', None, '1', '4', '2', '4'], 'Max Speed': [380., 370., 24., 26., 584., 48., 8., 123., None]})

Tweet_ID Reply_to_tweet_ID Max Speed 0 1 None 380 1 2 1 370 2 3 1 24 3 4 3 26 4 5 None 584 5 6 1 48 6 7 4 8 7 8 2 123 8 9 4 None

Tweet_ID1 Reply_to_tweet_ID1 Max Speed1 Tweet_ID2 Reply_to_tweet_ID2 Max Speed2 Tweet_ID3 Reply_to_tweet_ID3 Max Speed3 Tweet_ID4 Reply_to_tweet_ID4 Max Speed4 1 None 380 2 1 370 3 1 24 6 1 48 2 1 370 8 2 123 3 1 24 4 3 26 4 3 26 7 4 8 9 4 None

df = data.set_index(['Reply_to_tweet_ID', data.groupby('Reply_to_tweet_ID')\ .cumcount().add(1)])[['Tweet_ID','Max Speed']]\ .unstack().reset_index() df.columns = ["{}{}".format(a, b) for a, b in df.columns] df = df[df.Reply_to_tweet_ID != 'None']

Reply_to_tweet_ID Tweet_ID1 Tweet_ID2 Tweet_ID3 Max Speed1 Max Speed2 Max Speed3 0 1 2 3 6 370 24 48 1 2 8 NaN NaN 123 NaN NaN 2 3 4 NaN NaN 26 NaN NaN 3 4 7 9 NaN 8 None NaN

1条回答

网友

1楼 · 发布于 2024-06-28 19:27:32

IIUC，您可以通过使用merge的“自联接”来实现这一点，然后重塑数据帧，展平多索引列标题并merge返回原始数据帧：

#Create merged dataframe, data_m, to join Reply_to_tweet_ID to Tweet_ID
data_m = data[['Tweet_ID']].merge(data[['Reply_to_tweet_ID','Max Speed','Tweet_ID']], 
                                               left_on='Tweet_ID', 
                                               right_on='Reply_to_tweet_ID',
                                              suffixes=('','_y'))

#Use `set_index` with `groupby` and `cumcount` then `unstack` to 
#reshape long to wide for dataframe, data_u
data_u = data_m.set_index(['Tweet_ID', data_m.groupby('Tweet_ID').cumcount()+1]).unstack()
data_u = data_u.sort_index(axis=1, level=1)

#Flatten multiindex column header using list comprenhension    
data_u.columns = [f'{i}{j}' for i, j in data_u.columns]

#merge dataframe, data_u to the orginal dataframe, data
print(data.merge(data_u, on='Tweet_ID'))

输出：

  Tweet_ID Reply_to_tweet_ID  Max Speed  Max Speed1 Reply_to_tweet_ID1 Tweet_ID_y1  Max Speed2 Reply_to_tweet_ID2 Tweet_ID_y2  Max Speed3 Reply_to_tweet_ID3 Tweet_ID_y3
0        1              None      380.0       370.0                  1           2        24.0                  1           3        48.0                  1           6
1        2                 1      370.0       123.0                  2           8         NaN                NaN         NaN         NaN                NaN         NaN
2        3                 1       24.0        26.0                  3           4         NaN                NaN         NaN         NaN                NaN         NaN
3        4                 3       26.0         8.0                  4           7         NaN                  4           9         NaN                NaN         NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章