在Python3中，保持数据帧统计特性的最佳方法是什么？

FilterSystemO2Concentration (Percentage) ProcessChamberHumidityAbsolute (g/m3) ProcessChamberPressure (mbar) 0 0.156 1 29.5 28.4 29.6 28.4 2 0.149 1.3 29.567 28.9 3 0.149 1 29.567 28.9 4 0.148 1.6 29.6 29.4

1条回答

网友

1楼 · 发布于 2024-10-02 08:22:17

使用scipy.stats.rv_histogram(np.histogram(data)).isf(np.random.random(size=n))将创建从数据分布（直方图）中随机选择的n个新样本。可以对每个列执行以下操作：

示例：

import pandas as pd
import scipy.stats as stats

df = pd.DataFrame({'x': np.random.random(100)*3, 'y': np.random.random(100) * 4 -2})
n = 5
new_values = pd.DataFrame({s: stats.rv_histogram(np.histogram(df[s])).isf(np.random.random(size=n)) for s in df.columns})
df = df.assign(data_type='original').append(new_values.assign(data_type='oversampled'))
df.tail(7)
>>          x         y    data_type
98  1.176073 -0.207858     original
99  0.734781 -0.223110     original
0   2.014739 -0.369475  oversampled
1   2.825933 -1.122614  oversampled
2   0.155204  1.421869  oversampled
3   1.072144 -1.834163  oversampled
4   1.251650  1.353681  oversampled

相关问题更多 >

编程相关推荐

热门问题

热门文章

在Python3中，保持数据帧统计特性的最佳方法是什么？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >