在单个upd中创建pandas dataframe中的多个列

2024-09-20 03:41:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧如下:

df = pd.DataFrame({'Group': ['Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Fruit', 'Vegetable', 'Vegetable'],
                       'NId': ['Banana', 'Onion', 'Grapes', 'Potato', 'Apple', np.nan, np.nan],
                       'BName': [np.nan, 'GTwo', np.nan, 'GSix', np.nan, 'GOne', 'GNine'],
                       'BId': [np.nan, '5252', np.nan, '5678', np.nan, '5125', '5923']})
df['BId'] = df['BId'].astype(str)
df = df[['Group', 'NId', 'BName', 'BId']]

数据帧如下:

^{pr2}$

然后,我执行以下操作创建新列,如下所示:

df.loc[df['NId'].notna(), 'Cat'] = df[df['NId'].notna()].apply(lambda x: 'NId', axis=1)
df.loc[df['NId'].isna(), 'Cat'] = df[df['NId'].isna()].apply(lambda x: 'GId', axis=1)

df.loc[df['NId'].notna(), 'Id'] = df[df['NId'].notna()].apply(lambda x: str(x['NId']), axis=1)
df.loc[df['NId'].isna(), 'Id'] = df[df['NId'].isna()].apply(lambda x: x['BName'], axis=1)

df.loc[df['NId'].notna(), 'IdQ'] = df[df['NId'].notna()].apply(lambda x: 'NId:' + str(x['NId']), axis=1)
df.loc[df['NId'].isna(), 'IdQ'] = df[df['NId'].isna()].apply(lambda x: 'BId:' + x['BId'], axis=1)

产生以下输出数据帧:

       Group     NId  BName   BId  Cat      Id         IdQ
0      Fruit  Banana    NaN   nan  NId  Banana  NId:Banana
1  Vegetable   Onion   GTwo  5252  NId   Onion   NId:Onion
2      Fruit  Grapes    NaN   nan  NId  Grapes  NId:Grapes
3  Vegetable  Potato   GSix  5678  NId  Potato  NId:Potato
4      Fruit   Apple    NaN   nan  NId   Apple   NId:Apple
5  Vegetable     NaN   GOne  5125  BId    GOne    BId:5125
6  Vegetable     NaN  GNine  5923  BId   GNine    BId:5923

我想知道是否有一种方法可以将这些操作结合起来,或者有更好的方法来实现这一点。 基本上我所做的就是Id是NId如果不是其他的BName的话。Cat是NId,如果从NId else BId更新。IdQ列是“NId”+NId或“BId”+BId的组合,这取决于上面编码的逻辑。在


Tags: lambdadfnpnanlocapplyfruitaxis
2条回答

使用^{}

mask = df['NId'].notna()
df['Cat'] = np.where(mask, 'NId','GId')
df['Id']  = np.where(mask, df['NId'].astype(str), df['BName'])
df['IdQ'] = np.where(mask, 'NId:' +  df['NId'].astype(str), 'BId:' + df['BId'])
print (df)
       Group     NId  BName   BId  Cat      Id         IdQ
0      Fruit  Banana    NaN   nan  NId  Banana  NId:Banana
1  Vegetable   Onion   GTwo  5252  NId   Onion   NId:Onion
2      Fruit  Grapes    NaN   nan  NId  Grapes  NId:Grapes
3  Vegetable  Potato   GSix  5678  NId  Potato  NId:Potato
4      Fruit   Apple    NaN   nan  NId   Apple   NId:Apple
5  Vegetable     NaN   GOne  5125  GId    GOne    BId:5125
6  Vegetable     NaN  GNine  5923  GId   GNine    BId:5923

您可以使用pandas的assign函数同时分配多个列

df1 = df[df['NId'].notna()].assign(Cat = lambda x: 'NId', Id = lambda x: df.NId, IdQ = lambda x: 'NId:' + df['NId'])
df1.append(df[df['NId'].isna()].assign(Cat = lambda x: 'GId', Id = lambda x: df.BName, IdQ = lambda x: 'BId:' + df['BId']))

    Group     NId    BName  BId   Cat   Id      IdQ
0   Fruit     Banana NaN    nan   NId   Banana  NId:Banana
1   Vegetable Onion  GTwo   5252  NId   Onion   NId:Onion
2   Fruit     Grapes NaN    nan   NId   Grapes  NId:Grapes
3   Vegetable Potato GSix   5678  NId   Potato  NId:Potato
4   Fruit     Apple  NaN    nan   NId   Apple   NId:Apple
5   Vegetable NaN    GOne   5125  GId   GOne    BId:5125
6   Vegetable NaN    GNine  5923  GId   GNine   BId:5923

相关问题 更多 >