Python:基于条件匹配列

2024-06-26 14:32:41 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试协调状态上的两个策略数据集。本质上,我试图回答“给定匹配的保单编号,查找所有保单编号,其中状态不匹配。”以下是数据示例(逗号分隔):

policy1
PolicyNumber,Status,ExpirationDate
p0928999,Expired,01-02-2020
p092902,Cancelled,11-11-2020
p092902,Active, 10-02-2020
p089399, Active, 09-08-2020
p189128, Active, 12-20-2020
p77718, Active , 12-11-2020




policy2
PolicyNumber, Status, ExpirationDate
p0928999,Non-Renewal, 01-02-2020
p092902, Active , 10-02-2020
p089399,Non-Renewal, 09-08-2020
p889129, Cancelled, 02-01-2016
p77718, Renewed , 12-11-2020
p02902, Cancelled, 11-11-2020
p8383, Cancel Notice, 12-22-2020
p189128, Cancelled, 12-20-2020

保单2不续保状态可能等同于过期的保单1状态或有效的保单1状态,具体取决于到期日期:

  • 如果表保单2中的不续保过期 当前日期(2020年7月20日)则应等同于活动日期 策略1中的状态

  • 如果表保单2中的非续保在 到今天(2020年7月20日),则应等同于过期状态 在策略1

续保续保或**在保单2状态中请求的未续保应等同于保单1中的有效保单状态

policy1表中可能有重复项,如果有,我需要将policy2表中的状态与policy1中的最新过期PolicyNumber策略相匹配。最终,我需要根据指定的条件找到任何没有匹配状态的保单编号

保单2中的取消通知状态应等同于保单1中的活动状态

因此,在本例中,p189128在policy1policy2中的状态为Active。这将是唯一不匹配的示例。输出应为:

PolicyNumber
p189128

如果在两个数据集中(如p189128、p02902、p8383)存在不匹配的保单编号,则应将其从匹配过程中排除

以下是我在Python方面的工作:

import pandas as pd
cancel = pd.read_csv('policy1.csv')
policy = pd.read_csv('policy2.csv')

if (policy1["PolicyNumber"]==policy2["PolicyNumber"]):
    if (policy2["Status"]=="Non-Renewed"):
        if (pd.to_datetime(cancel["ExpirationDate"])>today()):
            cancel["Status"]="Active"
        else:
            cancel["Status"]="Expired"
    elif(policy2["Status"]=="Cancel Notice"):
        policy2["Status"]="Active"
    elif(policy2["Status"]=="Renewed"):
        policy2["Status"]="Active"
    elif(policy2["Status"]=="Renewal"):
        policy2["Status"]="Active"
    elif(policy2["Status"]=="Non-Renew Requested"):
        policy2["Status"]="Active"
    elif(policy2["Status"]=="Active"):
        policy2["Status"]="Active"
    elif(policy2["Status"]=="Cancelled"):
        policy2["Status"]="Cancelled"
    for i in policy2:
        if policy2["Status"] != policy1["Status"]:
            print(policy2["PolicyNumber"])
        else:
            pass
else:
    pass

运行时,我收到以下错误:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

编辑:下面是我尝试使用np的一个例子。根据我得到的反馈,选择:

conditions =[(policy2['Status'] == 'Active'), 
             (policy2['Status']=='Cancel Notice'),
            (policy2['Status'])=='Cancelled'), 
             (policy2['Status'])=='Renewed'),
            (policy2['Status'])=='Non-Renewed')& (policy2['ExpirationDate'])>today()), 
             (policy2['Status'])=='Non-Renewed')& (policy2['ExpirationDate'])<today()),
             (policy2['Status'])=='Renewal'),
            (policy2['Status'])=='Non-Renew Requested')]
choices = ['Active','Cancelled','Cancelled','Active','Active','Expired','Active','Active']
policy2['Status'] = np.select(conditions,choice,default='Active')
for index, row in policy2.iterrows():
    np.where(policy2['PolicyNumber']==policy1['PolicyNumber'], np.where(policy2['Status']==policy1['Status'],pass,print(policy2["PolicyNumber"]) pass)


Tags: 状态status策略编号activenonelifexpirationdate
1条回答
网友
1楼 · 发布于 2024-06-26 14:32:41

您需要使用SQL之类的工具—将数据集连接在一起,然后从连接的表中计算出结果

import numpy as np
import pandas as pd
cancel = pd.DataFrame([[1234,None], [1235, "Cancelled"], [1255, None],[1278,"Cancelled"],[1539,'Cancelled']], columns=['policyid', 'status'])

这是df吗

policyid    status
0   1234    None
1   1235    Cancelled
2   1255    None
3   1278    Cancelled
4   1539    Cancelled

这是什么政策

policy = pd.DataFrame([[1234, "non-renewed"], [22335, "active"], [1255, "non-renewed"]],
                       columns=['policyid', 'status'])

policyid    status
0   1234    non-renewed
1   22335   active
2   1255    non-renewed

将每个数据帧的索引设置为policyid并将它们连接在一起。使用左联接保留cancel数据框中的所有行

cancel.set_index('policyid', inplace=True)
policy.set_index('policyid', inplace=True)
cancel = cancel.join(policy, rsuffix='_new', how='left')

然后用status_new(从策略数据框复制)中的新状态值填充cancel中的空值

cancel['status'] = cancel['status'].fillna(cancel['status_new'])
cancel

    status  status_new
policyid        
1234    non-renewed non-renewed
1235    Cancelled   NaN
1255    non-renewed non-renewed
1278    Cancelled   NaN
1539    Cancelled   NaN

现在删除status_新列

cancel.drop(columns=['status_new'], inplace=True)

相关问题 更多 >