具有相同名称的数据框合并列和以逗号分隔的数据

2024-06-28 11:20:06 发布

您现在位置:Python中文网/ 问答频道 /正文

数据帧看起来像

让我们这样说df1

teamname  player.1  player.2  player.3
xyz        abc        nan       def
gh1        nan        hgf       jnr
oed        jeo        nan       nan

输出应该是

让我们说这个df2

teamname player
xyz       abc
          def
gh1       hgf
          jnr
oed       jeo

Tags: 数据defnandf1playerdf2abcxyz
2条回答
player_cols = [col for col in df1.columns if 'player' in col.lower()] #Your player column names

df_parts = [] # List to store mini-dfs
for col in player_cols:
    df_auxiliary = df1[['teamname', col]]
    df_auxiliary = df_auxiliary.rename(columns={col:'Players'})
    df_auxiliary = df_auxiliary.dropna()
    df_parts.append(df_axuliary)

df2 = pd.concat(df_parts) # Create final df

或在“一行”中:

df2 = pd.wide_to_long(df1, stubnames='player', i=['teamname'], j='player_num')
df2 = df2.dropna()

我会选择melt(),它非常通用:

  teamname player.1 player.2 player.3
0      xyz      abc      NaN      def
1      gh1      NaN      hgf      jnr
2      oed      jeo      NaN      NaN

导致

df.melt(id_vars=['teamname'], value_name='player').dropna().drop('variable', axis=1).sort_values(['teamname'], ascending=False).set_index('teamname')


         player
teamname       
xyz         abc
xyz         def
oed         jeo
gh1         hgf
gh1         jnr

melt后面的部分删除了NAN,删除了一个我们不需要的列,并对数据帧进行了排序。最后,我们将teamname设置为索引

相关问题 更多 >