如何执行带中断的嵌套列表理解？问题的回答

如何执行带中断的嵌套列表理解？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有一个大的距离数据框，我想分类 <pre><code>df_norms = pd.DataFrame([[0, 100, 200, 4000000]], columns=['mode', 'min', 'medium', 'max']) df_afst = pd.DataFrame([[0, 50, -1], [0, 150, -1], [0, 0, -1], [0, 250, -1]], columns = ['train', 'station', 'bbh']) </code></pre> 规范数据框表示，对于项目0，每个距离&lt；=100分类为0，下一个为&lt；=200它是1，最后是一网打尽&lt；=一个很大的数字是2 使用<code>for</code>循环很容易做到这一点。例如： <pre><code>for i in [0]: # 1 element list just for the example bbh_id = i + 2 mode = df_afst.iloc[0, i] for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values): for ix, x in enumerate(df_norms.iloc[mode]): if x > y: df_afst.loc[iy, df_afst.columns[bbh_id]] = ix - 1 break </code></pre> 之前： <pre><code> train station bbh 0 0 50 -1 1 0 150 -1 2 0 0 -1 3 0 250 -1 </code></pre> 之后 <pre><code> train station bbh 0 0 50 0 1 0 150 1 2 0 0 0 3 0 250 2 </code></pre> 我想在列表理解中这样做，但不知道如何做：<code>break</code>使它很难做到。我所能做的就是： <pre><code>for i in [0]: bbh_id = i + 2 mode = df_afst.iloc[0, i] r = [ix - 1 for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values) for ix, x in enumerate(df_norms.iloc[mode]) if x > y] # results in : [0, 1, 2, 1, 2, 0, 1, 2, 2] </code></pre> 如您所见，如果拆分结果，结果是正确的： <pre><code>[0, 1, 2 | 1, 2 | 0, 1, 2 | 2] </code></pre> 我只需要子列表的第一个，不知道如何。我无法模拟<code>break</code>。尝试了min、<code>[any][1]</code>和next，但就是做不好。有人有什么想法吗 更新 @chepner正确地纠正了我的例子不一致。对不起@Thierry Lathuille正确地指出，列表理解并不总是正确的工具。他在这一点上说得很对，因为我不知道他们什么时候是正确的工具，所以我想知道在这种情况下是如何工作的 我在这个答案上得到的两个答案对我很有启发。我从来没有听说过熊猫被割过，也从来没有想过要去哪里 出于好奇，我做了一个小基准测试 <pre><code>print('\n*** pd.cut') cpu = time.time() cuts = df_norms.iloc[0].tolist() bbh3 = pd.cut(df_afst['station'], cuts, labels=False, include_lowest=True) df_afst['bbh'] = bbh3 print('CPU {:.4f} seconds'.format(time.time() - cpu)) print('\n*** Using numpy and its functions') cpu = time.time() bbh2 = [np.min(np.argwhere(np.less(td, df_norms.values.ravel()))-1) for td in df_afst.station.values] df_afst['bbh'] = bbh2 print('CPU {:.4f} seconds'.format(time.time() - cpu)) print('\n*** Simple loop') cpu = time.time() for i in [0]: bbh_id = i + 2 mode = df_afst.iloc[0, i] for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values): for ix, x in enumerate(df_norms.iloc[mode]): if x > y: df_afst.loc[iy, df_afst.columns[bbh_id]] = ix - 1 break print('CPU {:.4f} seconds'.format(time.time() - cpu)) print('\n*** Wrong approach') cpu = time.time() for i in [0]: bbh_id = i + 2 mode = df_afst.iloc[0, i] r = [ix - 1 for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values) for ix, x in enumerate(df_norms.iloc[mode]) if x > y] print('CPU {:.4f} seconds'.format(time.time() - cpu)) </code></pre> 我将数据集从示例中的4个扩大到2000000个，接近我的10000000个数据集。我得到的结果很有趣： <pre><code>*** pd.cut CPU 0.0131 seconds *** Using numpy and its functions CPU 29.4257 seconds *** Simple loop CPU 214.5378 seconds *** Wrong approach CPU 103.5768 seconds </code></pre> 熊猫切割功能的加速令人难以置信。我仔细检查了结果，但它确实看起来不错 两个答案，既正确又富有洞察力。我决定将@carlos melus的答案标记为正确答案，因为他最接近我要求的理解列表

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何执行带中断的嵌套列表理解？

1 个回答

相关Python问题