根据条件查找列中的值

2024-06-25 17:53:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧:

df1:

   foo
0    2
1   11
2   18
3    6
4   14
5   12
6    8
7   13
8    7
9    5

df2:

    bar date
0   2   06-01-2020
1   5   06-01-2020
2   7   06-01-2020
3   8   06-01-2020
4   3   06-01-2020

df1['result'] = df1.foo.isin(df2.bar)

如果df1的“foo”出现在df2的“bar”中,我想在df2中查找日期值。因此,我尝试了以下方法:

df1['date'] = df2['date'].loc[df1.foo.isin(df2.bar)]

但它给出了单列值的输出

输出:

    foo result  date
0   2   True    06-01-2020
1   11  False   NaN
2   18  False   NaN
3   6   False   NaN
4   14  False   NaN
5   12  False   NaN
6   8   True    NaN
7   13  False   NaN
8   7   True    NaN
9   5   True    NaN

如果富谷不在酒吧,那么它应该有今天的日期,如下所示:

预期产出:

   foo result   date
0    2   True   06-01-2020
1   11  False   24-08-2020
2   18  False   24-08-2020
3    6  False   24-08-2020
4   14  False   24-08-2020
5   12  False   24-08-2020
6    8   True   06-01-2020
7   13  False   24-08-2020
8    7   True   06-01-2020
9    5   True   06-01-2020

Tags: 数据方法falsetruedatefoobarresult
3条回答

您可以在熊猫中使用数据帧合并,如下所示:

import pandas as pd
import numpy as np
from datetime import datetime

df1 = pd.DataFrame({'foo':[2,11,18,6,14,12,8,13,7,5]})
df2 = pd.DataFrame({'bar':[2,5,7,8,3], 'date': ['06-01-2020']*5})
df3 = df1.merge(df2,how='left', left_on='foo', right_on='bar')
df3['result'] = True
df3.loc[df3['bar'].isna(), ['result', 'date']] = [False, datetime.now().strftime('%d-%m-%Y')]
df3.drop('bar', inplace=True, axis=1)
print(df3)

使用合并

# Sample Data
df1 = pd.DataFrame( {'foo': [2,11,18,6,14,12,8,13,7,5]})

df2 = pd.DataFrame({'bar': [2,5,7,8,3],
                    'date' :  [datetime.date(2020, 1, 6)]*5  })

# Merge with left join and filter out required columns
df = df1.merge(df2, how='left', left_on='foo', right_on='bar')[['foo', 'date']]
# populate result based on the missing data
df['result'] = ~result['date'].isnull()
# Finally replace all missing date with the default one you want
df['date'] = df['date'].fillna(datetime.date(2020,8, 24))
print (df)

输出:

    foo date        result
0   2   2020-01-06  True
1   11  2020-08-24  False
2   18  2020-08-24  False
3   6   2020-08-24  False
4   14  2020-08-24  False
5   12  2020-08-24  False
6   8   2020-01-06  True
7   13  2020-08-24  False
8   7   2020-01-06  True
9   5   2020-01-06  True

使用^{}通过df2值创建的Series添加日期时间,最后替换缺少的值实际日期时间:

具有格式为DD-MM-YYYY的字符串DateTime的解决方案:

df1['result'] = df1.foo.isin(df2.bar)

now = pd.Timestamp('now').strftime('%d-%m-%Y')
df1['date'] = df1['foo'].map(df2.set_index('bar')['date']).fillna(now)
print (df1)
   foo  result        date
0    2    True  06-01-2020
1   11   False  24-08-2020
2   18   False  24-08-2020
3    6   False  24-08-2020
4   14   False  24-08-2020
5   12   False  24-08-2020
6    8    True  06-01-2020
7   13   False  24-08-2020
8    7    True  06-01-2020
9    5    True  06-01-2020

如果使用日期时间:

now = pd.Timestamp('now').strftime('d')
df1['date'] = df1['foo'].map(df2.set_index('bar')['date']).fillna(now)

相关问题 更多 >