Pandas数据帧比较和替换值

2024-10-05 13:21:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个熊猫数据帧,如下所示。“否”列是一个公共字段。基于“否”,我想替换第一个数据帧列“总计”中的值

条件为:如果“否”匹配,则从dataframe2获取“Marks1”值,并在“总计”列中替换。如果“Marks1”为空,则获取“Marks2”值并替换为“Total”。如果两个(标记1/标记2)都为空,则在“总计”列中替换为空。 最终结果应该在数据框1中。两个数据帧都有几十万条记录

Data frame1
No|Total
1234|11
2515|21
3412|32
4854|
7732|53

Data frame2
No|Marks1|Marks2
1234|99|23
2515|98|31
3412||20
4854||98
7732||

Result :
No|Total
1234|99
2515|98
3412|20
4854|98
7732|

Tags: 数据no标记data记录result条件total
2条回答

使用^{}替换缺少的值Marks1替换为Marks2替换为^{}

df = df2.set_index('No')

df1['Total'] = df1['No'].map(df['Marks1'].fillna(df['Marks2']))
print (df1)
     No  Total
0  1234   99.0
1  2515   98.0
2  3412   20.0
3  4854   98.0
4  7732    NaN

如果可能No中的df2重复值,则使用:

print (df2)
     No  Marks1  Marks2
0  1234    99.0    23.0 <- duplicated No
1  1234    98.0    31.0 <- duplicated No
2  3412     NaN    20.0
3  4854     NaN    98.0
4  7732     NaN     NaN

#newer pandas versions
df = df2.set_index('No').sum(level=0, min_count=1)
#oldier pandas versions
#df = df2.set_index('No').sum(level=0)
print (df)
      Marks1  Marks2
No                  
1234   197.0    54.0<- unique No, values are summed per index created by No
3412     NaN    20.0
4854     NaN    98.0
7732     NaN     NaN

df1['Total'] = df1['No'].map(df['Marks1'].fillna(df['Marks2']))
print (df1)
     No  Total
0  1234  197.0
1  2515    NaN
2  3412   20.0
3  4854   98.0
4  7732    NaN

如果在df1df2中有相同的索引值,并且每个No值匹配使用:

df1['Total'] = df2['Marks1'].fillna(df2['Marks2'])

你可以在这里使用^{}

m = df2['Marks1'].notna()
m1 = df2['Marks1'].isna() & df2['Marks2'].notna()
condlist = [m,m1]
choice = [df2['Marks1'] , df2['Marks2']]
df1['Total'] = np.select(condlist,choice,np.nan)

     No  Total
0  1234   99.0
1  2515   98.0
2  3412   20.0
3  4854   98.0
4  7732    NaN

相关问题 更多 >

    热门问题