比较在i上有文本的两个日期列

2024-10-01 00:34:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有这样一个数据帧

datecol1              datecol2
2005-02-22          EmployeeNotFound
2010-02-21          2010-02-22
EmployeeNotFound    EmployeeNotFound
EmployeeNotFound    2010-02-22

两列的数据类型都是Object

我想比较这两列并得到每列的最大日期

所以预期的结果是

    datecol1            datecol2                  ExpectedResult
    2005-02-22          EmployeeNotFound          2005-02-22
    2010-02-21          2010-02-22                2010-02-22
    EmployeeNotFound    EmployeeNotFound          EmployeeNotFound
    EmployeeNotFound    2010-02-25                2010-02-25

ExpectedResult的数据类型将再次成为对象


Tags: 数据对象object数据类型expectedresultdatecol2datecol1employeenotfound
2条回答

将列转换为日期时间,按轴1获取max,最后转换为字符串并替换NaT

cols = ['datecol1', 'datecol2']
df[cols] = df[cols].apply(pd.to_datetime, errors='coerce')
df['ExpectedResult'] = df[cols].max(axis=1)
df = df.astype(str).replace('NaT','EmployeeNotFound')
#alternative solution
#df = df.astype(str).mask(df.isnull(),'EmployeeNotFound')
print (df)
           datecol1          datecol2    ExpectedResult
0        2005-02-22  EmployeeNotFound        2005-02-22
1        2010-02-21        2010-02-22        2010-02-22
2  EmployeeNotFound  EmployeeNotFound  EmployeeNotFound
3  EmployeeNotFound        2010-02-22        2010-02-22

也可以使用numpy,因为numpy函数更快

import numpy as np
cond = df['datecol1'] != 'EmployeeNotFound'
df['ExpectedResult'] = np.where(cond, df['datecol1'], df['datecol2'])

这里首先将填充datecol1的所有有效值,然后剩余的值将由第二列datecol2填充

相关问题 更多 >