如何执行带中断的嵌套列表理解？

df_norms = pd.DataFrame([[0, 100, 200, 4000000]], columns=['mode', 'min', 'medium', 'max']) df_afst = pd.DataFrame([[0, 50, -1], [0, 150, -1], [0, 0, -1], [0, 250, -1]], columns = ['train', 'station', 'bbh'])

for i in [0]: # 1 element list just for the example bbh_id = i + 2 mode = df_afst.iloc[0, i] for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values): for ix, x in enumerate(df_norms.iloc[mode]): if x > y: df_afst.loc[iy, df_afst.columns[bbh_id]] = ix - 1 break

for i in [0]: bbh_id = i + 2 mode = df_afst.iloc[0, i] r = [ix - 1 for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values) for ix, x in enumerate(df_norms.iloc[mode]) if x > y] # results in : [0, 1, 2, 1, 2, 0, 1, 2, 2]

print('\n*** pd.cut') cpu = time.time() cuts = df_norms.iloc[0].tolist() bbh3 = pd.cut(df_afst['station'], cuts, labels=False, include_lowest=True) df_afst['bbh'] = bbh3 print('CPU {:.4f} seconds'.format(time.time() - cpu)) print('\n*** Using numpy and its functions') cpu = time.time() bbh2 = [np.min(np.argwhere(np.less(td, df_norms.values.ravel()))-1) for td in df_afst.station.values] df_afst['bbh'] = bbh2 print('CPU {:.4f} seconds'.format(time.time() - cpu)) print('\n*** Simple loop') cpu = time.time() for i in [0]: bbh_id = i + 2 mode = df_afst.iloc[0, i] for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values): for ix, x in enumerate(df_norms.iloc[mode]): if x > y: df_afst.loc[iy, df_afst.columns[bbh_id]] = ix - 1 break print('CPU {:.4f} seconds'.format(time.time() - cpu)) print('\n*** Wrong approach') cpu = time.time() for i in [0]: bbh_id = i + 2 mode = df_afst.iloc[0, i] r = [ix - 1 for iy, y in enumerate(df_afst[df_afst.columns[i+1]].values) for ix, x in enumerate(df_norms.iloc[mode]) if x > y] print('CPU {:.4f} seconds'.format(time.time() - cpu))

2条回答

网友

1楼 · 编辑于 2024-10-06 12:32:48

您将从使用^{}中受益匪浅：

假设您希望将值存储在df_afst['station']中（这在问题中并不完全清楚，但我根据示例猜测），您可以执行以下操作：

cuts = df_norms.iloc[0].tolist()
bbh = pd.cut(df_afst['station'], cuts, labels=False, include_lowest=True)

或者更直接地说：

bbh = pd.cut(df_afst['station'], [-1, 100, 200, float('inf')], labels=False)

结果：

>>> bbh
0    0
1    1
2    0
3    2
Name: station, dtype: int64

当然，您可以将其指定给列

这将比Python循环（显式或理解式）快几个数量级

网友

2楼 · 编辑于 2024-10-06 12:32:48

可以使用numpy将计算矢量化：

[np.min(np.argwhere(np.less(td, df_norms.values.ravel()))-1) for td in df_afst.station.values]

np.less将df_afst.station中的每个距离与df_范数中的所有值进行比较，并返回一个布尔矩阵，如果td小于df_范数中的相应值，则返回一个真值

例如，np.less（50[01002000000]）返回：array（[False，True，True，True]）

使用np.argwhere，我们从1开始提取输出数组中真值的索引，因此我们减去1使其从0开始。从那里，得到数组中为真的最小索引，这是您要查找的值

您可以在列表中运行所有这些，结果将是：[0,1,0,2]

相关问题更多 >

编程相关推荐

热门问题

热门文章