如何将pandas dataframe的单列拆分成多个列?

2024-09-22 22:39:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我不熟悉Python熊猫。我有一个数据帧如下:

df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
               'age': ['25', '22','21','32','37','26','24','30']})
print df

       Name age
0  football  25
1    ramesh  22
2    suresh  21
3    pankaj  32
4   cricket  37
5    rakesh  26
6     mohit  24
7    mahesh  30

“名称”列还包含“体育名称”和“体育人姓名”。我想把它分成两个不同的列,如下所示:

预期输出:

^{pr2}$

如果我在“Name”列上创建groupby,则不会得到预期的输出,而且它显然是直接输出,因为“Name”列中没有重复项。我需要使用什么来获得预期的输出?在

编辑:如果不想硬编码运动名称

df = pd.DataFrame({'Name': ['football', 'ramesh','suresh','pankaj','cricket','rakesh','mohit','mahesh'],
           'age': ['', '22','21','32','','26','24','30']})

df = df.replace('', np.nan, regex=True)

nan_rows = df[df.isnull().T.any().T]
sports = nan_rows['Name'].tolist()

df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)

我刚刚检查了除了“Name”列之外的所有其他列中包含NAN值的列,它肯定是sports名称。我创建了体育名称的列表,并利用下面的解决方案创建了sports_name和sports_person_name列。在


Tags: name名称dfagenansportsfootballcricket
2条回答

您可以使用:

#define list of sports
sports = ['football','cricket']
#create NaNs if no sport in Name, forward filling NaNs
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill()
#remove same values in columns sports_name and Name, rename column
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
#change order of columns
df = df[['sports_name','sport_person_name','age']]
print (df)
  sports_name sport_person_name age
0    football            ramesh  22
1    football            suresh  21
2    football            pankaj  32
3     cricket            rakesh  26
4     cricket             mohit  24
5     cricket            mahesh  30

^{}相似的解决方案-则无需重新排序:

^{pr2}$

如果只需要一个sport值,将limit=1添加到ffill并将NaNs替换为空字符串:

sports = ['football','cricket']
df['sports_name'] = df['Name'].where(df['Name'].isin(sports)).ffill(limit=1).fillna('')
d = {'Name':'sport_person_name'}
df = df[df['sports_name'] != df['Name']].reset_index(drop=True).rename(columns=d)
df = df[['sports_name','sport_person_name','age']]
print (df)
  sports_name sport_person_name age
0    football            ramesh  22
1                        suresh  21
2                        pankaj  32
3     cricket            rakesh  26
4                         mohit  24
5                        mahesh  30

您需要的输出是字典而不是数据帧。 字典将显示:

{'Sport' : {'Player' : age,'Player2' : age}}

如果你真的想要一个数据帧: 如果名字总是出现在玩家面前:

^{pr2}$

应该是什么样子:

sports_name sport_person_name age
football    ramesh            25
football    suresh            22
football    pankaj            32
cricket     rakesh            26
cricket     mohit             24
cricket     mahesh            30

相关问题 更多 >