比较python中dataframe的单个列的值（将perl转换为python代码）问题的回答

比较python中dataframe的单个列的值（将perl转换为python代码）

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试将一个perl代码转换为python，以便用一些常量对列进行简单的乘法。我创建了一个包含多列浮点值的数据帧。以下是组件.csv文件。你知道吗 <pre><code> NA MULT NOA NOB CHARGE EX 0 8.0 1.0 24.0 24.0 0.0 1.0 1 8.0 1.0 24.0 24.0 0.0 1.0 2 8.0 1.0 6.0 6.0 0.0 1.0 3 20.0 1.0 18.0 18.0 0.0 1.0 4 23.0 1.0 21.0 21.0 0.0 1.0 5 26.0 1.0 24.0 24.0 0.0 1.0 6 11.0 1.0 13.0 13.0 0.0 0.0 7 16.0 1.0 19.0 19.0 1.0 0.0 8 1.0 1.0 4.0 4.0 -1.0 0.0 9 17.0 1.0 23.0 23.0 0.0 0.0 10 1.0 1.0 4.0 4.0 0.0 0.0 11 1.0 1.0 4.0 4.0 0.0 0.0 </code></pre> 初始参数为： <pre><code>$A1 = 9.3692400791; $A2 = 9.4492960287; $B = 3.8320915550; $C = 9.5936653352; $D = 1.8739215238; $E = 2.4908584058; </code></pre> 预期输出为单列（d5）： <pre><code>Df5 -0.2249 -0.2249 -0.0562 -0.1686 -0.1968 -0.2249 -0.1218 -0.1780 -0.0384 -0.2155 -0.0375 -0.0375 </code></pre> 使用以下命令解析到数据帧后： <pre><code>pd.set_option('precision', 8) df = pd.read_csv("unscaled_components_delimit.csv", delimiter= ",", header=0) </code></pre> 我要检查多个条件，例如下面的脚本： <pre><code>if (df['NA'] > 1).any(): print(True) elif (df['NA'] == 1).any(): print(False) </code></pre> 但是，上面的代码只打印一个值True，即使标题为NA的列中有多个值1.0，这意味着它不会传递给第二个elif。我使用了函数any（），也许应该使用另一个我目前不知道的函数。因此，有没有人能提出一个解决方案？你知道吗 目标是将列的每个元素（标题NA）与数字1（较大或相等）进行比较。然后，使用其他列的其他条件执行一些操作。你知道吗 如果有任何帮助或建议，我将不胜感激。你知道吗 为更清楚起见，以下代码包含最终所需的代码以及所有必需的条件： <pre><code>if (df['NA'] > 1).any(): if (df['MULT'] == 1).any(): if ((df['NOB'] != 1).any() or (df['NOB'] ==1).any()): d5 = -A1*df['NOB'] elif((df['NOB'] == 1).any()): d5 = -E*df['NOB'] else: d5 = -A2*df['NOB'] - B*(df['NOA']-df['NOB']) elif (df['NA'] == 1).any(): if (df['MULT'] == 1).any(): if ((df['EX'] == 0).any() and (df['NOB'] == 4).any() and (df['CHARGE'] == 0).any()): d5 = -A1*df['NOB'] elif((df['NOB'] != 1).any() or ((df['NOB'] == 1).any() and (df['EX'] != 0).any() )): d5 = -C*df['NOB'] elif((df['NOB'] == 1).any() and (df['EX'] == 0).any()): d5 = -E*df['NOB'] else: d5 = -C*df['NOB'] - D*(df['NOA']-df['NOB']) </code></pre> 原始的perl代码如下（$nh不需要，perl中的hlc是python中的d5）： <pre><code> if ($na > 1) { if ($mult == 1) { if (($nob != 1) || (($nob == 1) && ($nh != 0))) { $hlc = -$A1 * $nob; } elsif (($nob == 1) && ($nh == 0)) { $hlc = -$E *$nob; } } else { $hlc = -$A2 * $nob - $B * ($noa - $nob); } } ### HLC for atomic species ### elsif ($na == 1) { if ($mult == 1) { if (($ex == 0) && ($nob == 4) && ($charge == 0)) { $hlc = -$A1 * $nob; } elsif (($nob != 1) || (($nob == 1) && ($ex != 0))) { $hlc = -$C * $nob; } elsif (($nob == 1) && ($ex == 0)) { $hlc = -$E *$nob; } } else { $hlc = -$C * $nob - $D * ($noa - $nob); } } </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

@Denver已经解释了为什么你在NA专栏上的比较没有得到你所期望的结果。你知道吗 不过，我做比较的方式和他不同。以下是有助于您理解的小片段： <pre class="lang-py prettyprint-override"><code># a series of bools, indicating for which index our condition is true na_gt_1_series = df["NA"] > 1 print(na_gt_1) # creating a new column based on the values of the NA column df["na_gt_1"] = na_gt_1_series print(df) </code></pre> <hr/> 现在，由于这里的条件非常复杂，我认为使用pandas的apply函数会更简单，它沿着数据帧的某个轴应用函数。你知道吗 <pre class="lang-py prettyprint-override"><code>def get_row_df5(row): df5 = 0 if row["NA"] > 1: if row["MULT"] == 1: if row["NOB"] == 1: df5 = -A1 * row["NOB"] else: df5 = -A2 * row["NOB"] - B * (row["NOA"] - row["NOB"]) elif row["NA"] == 1: if row["MULT"] == 1: if row["EX"] == 0 and row["NOB"] == 4 and row["CHARGE"] == 0: df5 = -A1 * row["NOB"] elif row["NOB"] != 1 or row["NOB"] == 1 and row["EX"] != 0: df5 = -C * row["NOB"] elif row["NOB"] == 1 and row["EX"] == 0: df5 = -E * row["NOB"] else: df5 = -C * row["NOB"] - D * (row["NOA"] - row["NOB"]) return df5 df5_res = df.apply(func=get_row_df5, axis=1) </code></pre> <hr/> 不幸的是，这种简单性是要付出代价的。对于通过复制示例数据而生成的120000行数据帧，apply解决方案需要~4s，而下面的解决方案需要~40ms（快100倍）。你知道吗 <pre class="lang-py prettyprint-override"><code>def get_df5_broad(df_in): na_lt_1 = df_in["NA"] > 1 na_eq_1 = df_in["NA"] == 1 mult_eq_1 = df_in["MULT"] == 1 mult_ne_1 = ~mult_eq_1 res_series = pd.Series(np.zeros(shape=df_in.shape[0])) res_series.loc[na_lt_1 & mult_eq_1 & (df_in["NOB"] == 1)] = -A1 * df_in["NOB"] res_series.loc[na_lt_1 & mult_ne_1] = -A2 * df_in["NOB"] - B * (df_in["NOA"] - df_in["NOB"]) res_series.loc[na_eq_1 & mult_eq_1 & (df_in["EX"] == 0) & (df_in["NOB"] == 4) & (df_in["CHARGE"] == 0)] = -A1 * df_in["NOB"] res_series.loc[na_eq_1 & mult_eq_1 & ((df_in["NOB"] != 1) | ((df_in["NOB"] == 1) & (df_in["EX"] != 0)))] = -C * df_in["NOB"] res_series.loc[na_eq_1 & mult_eq_1 & (df_in["NOB"] == 1) & (df_in["EX"] == 0)] = -E * df_in["NOB"] res_series.loc[na_eq_1 & mult_ne_1] = -C * df_in["NOB"] - D * (df_in["NOA"] - df_in["NOB"]) return res_series </code></pre> <hr/> 最后，下一种方法是两全其美的。它的设计和简单性与apply的方法相似，但只比以前的高性能版本慢5倍。你知道吗 <pre class="lang-py prettyprint-override"><code>def get_df5_tupe(tupe): df5 = 0 if tupe.NA > 1: if tupe.MULT == 1: if tupe.NOB == 1: df5 = -A1 * tupe.NOB else: df5 = -A2 * tupe.NOB - B * (tupe.NOA - tupe.NOB) elif tupe.NA == 1: if tupe.MULT == 1: if tupe.EX == 0 and tupe.NOB == 4 and tupe.CHARGE == 0: df5 = -A1 * tupe.NOB elif tupe.NOB != 1 or tupe.NOB == 1 and tupe.EX != 0: df5 = -C * tupe.NOB elif tupe.NOB == 1 and tupe.EX == 0: df5 = -E * tupe.NOB else: df5 = -C * tupe.NOB - D * (tupe.NOA - tupe.NOB) return df5 def get_df5_iter(df_in): return pd.Series((get_df5_tupe(curr) for curr in df_in.itertuples(index=False))) </code></pre> 注意：这些方法并不总是返回正确的答案，因为运算中的逻辑含糊不清。我将编辑我的解决方案，一旦正确的布尔表达式可用。你知道吗

比较python中dataframe的单个列的值（将perl转换为python代码）

1 个回答

相关Python问题