如何根据多个条件向数据帧插入值?逻辑问题

2024-10-02 06:37:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧:

带有“状态”、“日期”、“编号”列的df1

DF1

带有“state”、“specificDate”列的df2(一个状态对应一个specificDate,每个状态只提及一次)

DF2

最后,我希望有一个包含“state”、“specificDate”、“number”列的数据集。此外,我想在每个特定日期加上14天,并获得这些日期的数字

我试过这个

df = df1.merge(df2, left_on='state', right_on='state')

df['newcolumn'] = np.where((df.state == df.state)& (df.date == df.specificDate), df.numbers)
df['newcolumn'] = np.where((df.state == df.state)& (df.date == df.specificDate+datetime.timedelta(days=14)), df.numbers)

但我有一个错误: ValueError:包含多个元素的数组的真值不明确。使用a.any()或a.all()

当我添加all()时,仍然会出现相同的错误

我觉得我的逻辑不正确。我还可以如何将这些值插入数据集中


Tags: 数据dfdateon状态错误npall
3条回答

我想您应该使用df2作为联接的左侧。您可以使用pd.DateOffset添加14天

# create dataset with specific date and specific date + 14
df2_14 = df2.set_index('state')['date'].apply(pd.DateOffset(14)).reset_index()
df = pd.concat([df2, df2_14]) 

# now join the values from df1
df = df.join(df1.set_index(['state', 'date']), 
             how='left', 
             on=['state', 'specificDate'])

您可以声明一个空数据框,并在其中插入过滤后的数据

要筛选数据,您可以遍历df2的所有行,并使用相同的state名称在specificDate列和specificDate+14的日期之间设置掩码

我已经用数据帧中的几个值创建了两个数据帧df1df2,并测试了上述过程

import pandas as pd
import datetime


data1 = {
    "state":["Alabama","Alabama","Alabama"],
    "date":["3/12/20", "3/13/20", "3/14/20"],
    "number":[0,5,7]
}

data2 = {
    "state": ["Alabama", "Alaska"],
    "specificDate": ["03.13.2020", "03.11.2020"]
}

df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])

final_df = pd.DataFrame()

for index, row in df2.iterrows():    
    begin_date = row["specificDate"]
    end_date = begin_date+datetime.timedelta(days=14)
    mask = (df1['date'] >= begin_date) & (df1['date'] <= end_date) & (df1['state'] == row['state'])
    filtered_data = df1.loc[mask]
    if not filtered_data.empty:
        final_df = final_df.append(filtered_data, ignore_index=True)

print(final_df)

输出:

     state       date  number
0  Alabama 2020-03-13       5
1  Alabama 2020-03-14       7

更新的答案

要仅显示特定日期和特定日期+14th date fromdf1的数据,我们应该更新上述代码片段的mask

import pandas as pd
import datetime


data1 = {
    "state":["Alabama","Alabama","Alabama","Alabama","Alabama"],
    "date":["3/12/20", "3/13/20", "3/14/20", "3/27/20", "3/28/20"],
    "number":[0,5,7,9,3]
}

data2 = {
    "state": ["Alabama", "Alaska"],
    "specificDate": ["03.13.2020", "03.11.2020"]
}

df1 = pd.DataFrame(data1)
df1['date'] = pd.to_datetime(df1['date'])
df2 = pd.DataFrame(data2)
df2['specificDate'] = pd.to_datetime(df2['specificDate'])

final_df = pd.DataFrame()

for index, row in df2.iterrows():    
    first_date = row["specificDate"]
    last_date = first_date+datetime.timedelta(days=14)
    mask = ((df1['date'] == first_date) | (df1['date'] == last_date)) & (df1['state'] == row['state'])
    filtered_data = df1.loc[mask]
    if not filtered_data.empty:
        final_df = final_df.append(filtered_data, ignore_index=True)

print(final_df)

输出:

     state       date  number
0  Alabama 2020-03-13       5
1  Alabama 2020-03-27       9

在Eric的答案的第一行上稍微有点夸张,让问题变得简单一点,因为我不明白他为什么使用set_index和reset_index

df2_14['date'] = df2['date'].apply(pd.DateOffset(14))

相关问题 更多 >

    热门问题