在csv文件中存储和读取多个直方图

histogram1 : (-1.3747106810983318, 3.529160051186781] 0.012520 (3.529160051186781, 8.433030783471894] 0.013830 (8.433030783471894, 13.336901515757006] 0.016495 (13.336901515757006, 18.24077224804212] 0.007194 (18.24077224804212, 23.144642980327234] 0.041667 (23.144642980327234, 28.048513712612344] 0.000000

1条回答

网友

1楼 · 发布于 2024-10-06 15:26:09

您在这里讨论的是两个不同的主题：

存储多个系列的有效方法是什么
如何从已形成的IntervalIndex确定float的bin

第一部分很简单。在保存到csv（或者更确切地说）之前，我会使用pandas.concat()创建一个大框架

pd.concat(histograms, keys=hist_names, names=['hist_name','bin']).rename('random_variable').to_frame().to_parquet()

有关更多信息，请参见.to_parquet()、this answer和this benchmark

然后，在回读时，选择一个带有

hist1 = df.loc[('hist1', :), 'random_variable']

或

grouped = df.reset_index('hist_name').groupby('hist_name')
hist1 = grouped.get_group('hist1')

第二部分已经得到答复。简而言之，您需要通过以下方式展平IntervalIndex：

bins = hist1.index.right

然后，您可以使用numpy.digitize找到您的值（或值列表）的bin：

i = np.digitize(my_value, bins)
return_value = hist1.iloc[i]

编辑

刚刚发现this answer关于Indexing with an IntervalIndex，这也适用于：

return_value = hist1.loc[my_value]

相关问题更多 >

编程相关推荐

热门问题

热门文章