<p>@Denver已经解释了为什么你在NA专栏上的比较没有得到你所期望的结果。你知道吗</p>
<p>不过,我做比较的方式和他不同。以下是有助于您理解的小片段:</p>
<pre class="lang-py prettyprint-override"><code># a series of bools, indicating for which index our condition is true
na_gt_1_series = df["NA"] > 1
print(na_gt_1)
# creating a new column based on the values of the NA column
df["na_gt_1"] = na_gt_1_series
print(df)
</code></pre>
<hr/>
<p>现在,由于这里的条件非常复杂,我认为使用pandas的apply函数会更简单,它沿着数据帧的某个轴应用函数。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>def get_row_df5(row):
df5 = 0
if row["NA"] > 1:
if row["MULT"] == 1:
if row["NOB"] == 1:
df5 = -A1 * row["NOB"]
else:
df5 = -A2 * row["NOB"] - B * (row["NOA"] - row["NOB"])
elif row["NA"] == 1:
if row["MULT"] == 1:
if row["EX"] == 0 and row["NOB"] == 4 and row["CHARGE"] == 0:
df5 = -A1 * row["NOB"]
elif row["NOB"] != 1 or row["NOB"] == 1 and row["EX"] != 0:
df5 = -C * row["NOB"]
elif row["NOB"] == 1 and row["EX"] == 0:
df5 = -E * row["NOB"]
else:
df5 = -C * row["NOB"] - D * (row["NOA"] - row["NOB"])
return df5
df5_res = df.apply(func=get_row_df5, axis=1)
</code></pre>
<hr/>
<p>不幸的是,这种简单性是要付出代价的。对于通过复制示例数据而生成的120000行数据帧,apply解决方案需要~4s,而下面的解决方案需要~40ms(快100倍)。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>def get_df5_broad(df_in):
na_lt_1 = df_in["NA"] > 1
na_eq_1 = df_in["NA"] == 1
mult_eq_1 = df_in["MULT"] == 1
mult_ne_1 = ~mult_eq_1
res_series = pd.Series(np.zeros(shape=df_in.shape[0]))
res_series.loc[na_lt_1 & mult_eq_1 & (df_in["NOB"] == 1)] = -A1 * df_in["NOB"]
res_series.loc[na_lt_1 & mult_ne_1] = -A2 * df_in["NOB"] - B * (df_in["NOA"] - df_in["NOB"])
res_series.loc[na_eq_1 & mult_eq_1 & (df_in["EX"] == 0) & (df_in["NOB"] == 4) & (df_in["CHARGE"] == 0)] = -A1 * df_in["NOB"]
res_series.loc[na_eq_1 & mult_eq_1 & ((df_in["NOB"] != 1) | ((df_in["NOB"] == 1) & (df_in["EX"] != 0)))] = -C * df_in["NOB"]
res_series.loc[na_eq_1 & mult_eq_1 & (df_in["NOB"] == 1) & (df_in["EX"] == 0)] = -E * df_in["NOB"]
res_series.loc[na_eq_1 & mult_ne_1] = -C * df_in["NOB"] - D * (df_in["NOA"] - df_in["NOB"])
return res_series
</code></pre>
<hr/>
<p>最后,下一种方法是两全其美的。它的设计和简单性与apply的方法相似,但只比以前的高性能版本慢5倍。你知道吗</p>
<pre class="lang-py prettyprint-override"><code>def get_df5_tupe(tupe):
df5 = 0
if tupe.NA > 1:
if tupe.MULT == 1:
if tupe.NOB == 1:
df5 = -A1 * tupe.NOB
else:
df5 = -A2 * tupe.NOB - B * (tupe.NOA - tupe.NOB)
elif tupe.NA == 1:
if tupe.MULT == 1:
if tupe.EX == 0 and tupe.NOB == 4 and tupe.CHARGE == 0:
df5 = -A1 * tupe.NOB
elif tupe.NOB != 1 or tupe.NOB == 1 and tupe.EX != 0:
df5 = -C * tupe.NOB
elif tupe.NOB == 1 and tupe.EX == 0:
df5 = -E * tupe.NOB
else:
df5 = -C * tupe.NOB - D * (tupe.NOA - tupe.NOB)
return df5
def get_df5_iter(df_in):
return pd.Series((get_df5_tupe(curr) for curr in df_in.itertuples(index=False)))
</code></pre>
<p>注意:这些方法并不总是返回正确的答案,因为运算中的逻辑含糊不清。我将编辑我的解决方案,一旦正确的布尔表达式可用。你知道吗</p>