在datafram中使用布尔掩码替换行迭代

2024-10-02 06:24:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的df

days    days_1    days_2    period    percent_1   percent_2    amount
3       5         4         1         0.2         0.1         100
2       1         3         4         0.3         0.1         500
9       8         10        6         0.4         0.2         600
10      7         8         11        0.5         0.3         700
10      5         6         7         0.7         0.4         800 

我有以下逻辑适用于df的每一行

for each row in df:
    if days < days_1:
        amount_missed = 0
        days_missed = 0
    elif days_1 < days < days_2:
        missed_percent = percent_1 - percent_2
        amount_missed = amount * (missed_percent / 100)
        days_missed = days - days_1    
    elif days_2 < days < period or days > period:    
        missed_percent = percent_2
        amount_missed = amount * (missed_percent / 100)
        days_missed = days - days_2
    else:
        amount_missed = 0
        days_missed = 0 

我试图使用布尔掩码和np.where来转换上述逻辑,如下所示

cond1 = df['days_2'] < df['days']
cond2 = df['days'] < df['period']
cond3 = df['days'] > df['period']
cond4 = df['days'] >= df['days_1']
cond5 = df['days'] < df['days_2']
cond6 = df['days'] > df['days_1']

mask = ((cond1 & cond2) | cond3) & cond4
mask2 = cond5 & cond6

df['amount_missed'] = np.where(mask, df['amount'] * df['percent_2'] / 100, 0.0)
df['amount_missed'] = np.where(mask2, df['amount'] * (df['percent_1'] - df['percent_2']) / 100, 0.0)

df['days_missed'] = np.where(mask, df['days'] - df['days_2'], 0)
df['days_missed'] = np.where(mask2, df['days'] -df['days_1'], 0)

但是上面代码的结果和行迭代的结果不一样,应该是

{
 'amount_missed': {0: 0.0, 1: 1.0, 2: 1.2, 3: 2.1, 4: 3.2},
 'days_missed': {0: 0, 1: 1, 2: 1, 3: 2, 4: 4}
 }  

布尔掩码1生成以下结果

{
 'amount_missed': {0: 0.0, 1: 0.9999999999999999, 2: 1.2, 3: 0.0, 4: 0.0},
 'days_missed': {0: 0, 1: 1, 2: 1, 3: 0, 4: 0}
 }

我想知道如何修复它,也许这里还有其他方法来替换df行迭代。你知道吗


Tags: dfnpmask逻辑wheredaysamountperiod
2条回答

这个错误的根本原因是每次都用一个新的np.哪里(),而不是级联where()表达式。但比级联where()表达式是^{}更好:

c0 = df.days < df.days_1
c1 = (df.days_1 < df.days) & (df.days < df.days_2)
c2 = ((df.days_2 < df.days) & (df.days < df.period)) | (df.days > df.period)

df['days_missed'] = np.select([c0, c1, c2], [0, df.days - df.days_1, df.days - df.days_2])

用于生成原始数据帧的代码(来自原始的未编辑问题):

df = pd.DataFrame({
    'days': [3, 2, 9, 10, 10],
    'days_1': [5, 1, 8, 7, 5],
    'days_2': [4, 3, 10, 8, 6],
    'period': [1, 4, 6, 11, 7],
    'percent_1': [0.2, 0.3, 0.4, 0.5, 0.7],
    'percent_2': [0.1, 0.1, 0.2, 0.3, 0.4],
    'amount': [100, 500, 600, 700, 800]
}, columns=['days', 'days_1', 'days_2', 'period', 'percent_1', 'percent_2', 'amount'])

以下代码提供了原始问题中所需的结果(对于在注释中被要求这样做后创建的简化案例,不进行更新):

df['amount_missed'] = np.where((df['days_1'] < df['days']) & (df['days'] < df['days_2']),
                               df['amount'] * (df['percent_1'] - df['percent_2']) / 100,
                               np.where((df['days_2'] < df['days']) & (df['days'] < df['period']),
                                        df['amount'] * (df['percent_2']) / 100,
                                        0.0))

df['days_missed'] = np.where((df['days_1'] < df['days']) & (df['days'] < df['days_2']),
                             df['days'] - df['days_1'],
                             np.where((df['days_2'] < df['days']) & (df['days'] < df['period']),
                                      df['days'] - df['days_2'],
                                      0))

输出:

   days  days_1  days_2  period  percent_1  percent_2  amount  amount_missed  \
0     3       5       4       1        0.2        0.1     100            0.0   
1     2       1       3       4        0.3        0.1     500            1.0   
2     9       8      10       6        0.4        0.2     600            1.2   
3    10       7       8      11        0.5        0.3     700            2.1   
4    10       5       6       7        0.7        0.4     800            0.0   

   days_missed  
0            0  
1            1  
2            1  
3            2  
4            0  

编辑:

^{}相同的答案:

m1 = (df['days_1'] < df['days']) & (df['days'] < df['days_2'])
s1 = df['amount'] * (df['percent_1'] - df['percent_2']) / 100
s11 = df['days'] - df['days_1']

m2 = (df['days_2'] < df['days']) & (df['days'] < df['period'])
s2 = df['amount'] * (df['percent_2']) / 100
s22 = df['days'] - df['days_2']

df['amount_missed'] = np.select([m1, m2], [s1, s2], default=0)
df['days_missed'] =   np.select([m1, m2], [s11, s22], default=0)

相关问题 更多 >

    热门问题