为衰减变量优化python代码的以下部分

time_col = 'mnth' tactic =['overall_details','speaker_total','overall_samples_eu','copay_redemption_count','voucher_redemption_count','dtc'] tactic_decay_dict = dict.fromkeys(tactic,(60,70)) uniq = len(df_pre_decay[time_col].unique()) ## Loops for variables and decay rate for a in tactic_decay_dict: for b in tactic_decay_dict[a]: xyz = a+'_s'+str(b) ## Loops for iterating over each row in the dataset for i in range(len(df_pre_decay)): df_pre_decay[xyz] = np.where((i%uniq)!=0, (df_pre_decay[xyz].iloc[i-1])*b/100+ (df_pre_decay[a].iloc[i])*(100-b)/100, df_pre_decay[a].iloc[i])

ID mnth overall_details speaker_total overall_samples_eu copay_redemption_count voucher_redemption_count dtc 1 201701 3 1 10 9 3 6 1 201702 6 1 0 7 7 10 1 201703 10 8 7 8 9 10 1 201704 3 9 3 0 1 1 1 201705 9 0 8 9 6 4 1 201706 8 3 2 10 8 9 1 201707 3 10 3 0 5 6 1 201708 2 10 3 9 6 2 1 201709 1 3 7 10 8 0 1 201710 3 8 2 8 0 10 1 201711 6 7 4 8 5 6 1 201712 3 8 2 9 4 10 2 201701 7 4 7 4 10 2 2 201702 10 0 2 2 10 5 2 201703 10 6 4 10 5 3 2 201704 4 3 6 4 0 8 2 201705 7 8 9 10 6 10 2 201706 8 0 2 7 1 8 2 201707 10 2 8 1 9 4 2 201708 10 6 7 0 3 5 2 201709 10 10 3 8 9 0 2 201710 2 0 3 5 5 8 2 201711 1 8 0 7 3 4 2 201712 8 5 1 0 7 9 3 201701 2 2 7 7 1 2 3 201702 2 8 10 9 6 9 3 201703 10 5 8 5 9 4 3 201704 6 1 2 4 6 2 3 201705 6 9 4 4 3 0 3 201706 5 1 6 4 1 7 3 201707 0 7 6 9 5 6 3 201708 10 3 2 0 4 5 3 201709 5 8 6 4 10 4 3 201710 8 3 10 6 7 0 3 201711 7 5 6 3 1 10 3 201712 3 9 8 4 10 0

1条回答

网友

1楼 · 发布于 2024-09-23 08:15:23

我认为您的代码不会按预期工作，因为您在循环的每一轮中都有效地将整个列df_pre_decay[xyz]设置为一个值。您需要或者循环遍历数据帧的每一行（for i in range(len(df_pre_decay))），或者将列视为向量（正如np.where和其他numpy函数所做的那样），但是您将两者混为一谈。矢量化方法通常要快得多。你知道吗

对于非矢量化版本，将列xyz设置为与列a相同，然后循环遍历行，在需要时设置累积值。你知道吗

for a in tactic_decay_dict:
    for b in tactic_decay_dict[a]:
        xyz = a+'_s'+str(b)
## Loops for iterating over each row in the dataset
        df_pre_decay[xyz] = df_pre_decay[a]
        for i in range(len(df_pre_decay)):
            if i % uniq != 0:
                df_pre_decay[xyz].iloc[i] = (df_pre_decay[xyz].iloc[i-1] * b/100
                     + df_pre_decay[a].iloc[i] * (100 - b)/100)

或者另一个版本-不确定哪一个会更快：

for a in tactic_decay_dict:
    for b in tactic_decay_dict[a]:
        xyz = a+'_s'+str(b)
        column = []
        for i, x in enumerate(df_pre_decay[a]):
            if i % uniq == 0:
                current = x
            else:
                current = x * b/100 + current * (100-b)/100
            column.append(current)
        df[xyz] = column

要进行矢量化，可以使用^{}将列拆分为块，并对每个块应用累积衰减函数。你知道吗

for a in tactic_decay_dict:
    for b in tactic_decay_dict[a]:
        xyz = a+'_s'+str(b)
        decay_func = np.frompyfunc(lambda u, v: u * b / 100.0 + v * (100-b) / 100.0, 2, 1)
        decayed = np.array([])
        for top in range(0, len(df_pre_decay), uniq):
            chunk = df_pre_decay[a][top:top+uniq]
            decayed = np.concatenate((decayed, 
                                  decay_func.accumulate(chunk, dtype=np.object).astype(np.float)))
        df_pre_decay[xyz] = decayed

另一种方法是在不同的id之间插入具有空值的空行。然后可以对整个列应用单个累积函数：

# insert blank rows in the data
df.index = df.index + df.index // uniq
df.reindex(index=range(len(df) + len(df) // uniq))   

def get_decay_func(b):
    def inner(u, v):
        if pd.isnull(u) or pd.isnull(v):
            return v
        else:
            return u * b/100.0 + v * (100-b)/100.0
    return inner

for a in tactic_decay_dict:
   for b in tactic_decay_dict[a]:
        decay = get_decay_func(b).accumulate
        xyz = a+'_s'+str(b)
        df_pre_decay[xyz] = decay(df_pre_decay[a], dtype=np.object).astype(df.float)

相关问题更多 >

编程相关推荐

热门问题

热门文章