Pandas数据帧在没有for循环的行上迭代

date size price 0 2018-08-01 100 220 1 2018-08-01 110 245 2 2018-08-01 125 250 3 2018-08-02 110 210 4 2018-08-02 120 230 5 2018-08-02 150 260 6 2018-08-03 115 200

2条回答

网友

1楼 · 编辑于 2024-10-02 04:27:24

我不确定您的预期输出是什么，但是如果您想找到具有多个事务的日期上最接近大小的平均值，您可以这样做。如果您正在寻找其他内容，请提供预期输出：

df = pd.read_clipboard()

# find the diff on the size column and backfill the NaN values
df['diff'] = df.groupby('date')['size'].diff().fillna(method='bfill')

# group by date and use the lambda function to find the min diff
df2 = df.groupby(['date']).apply(lambda x: x[x['diff'] == x['diff'].min()])

# find the mean of price
df2.groupby('date')['price'].mean()

date
2018-08-01    232.5
2018-08-02    220.0
Name: price, dtype: float64

网友

2楼 · 编辑于 2024-10-02 04:27:24

我调用了dfa原始数据帧。首先在dfb中创建以后^{}所需的数据

k = 2 # should work for any number
dfb = dfa.copy()
dfb = dfb.sort_values(['date','size']) #actually need in dfa too
# get the k-mean
dfb['avg_price'] = dfb.groupby('date').price.rolling(k).mean().values
#to look for the k nearest sizes in merge_asof
dfb['size'] = dfb.groupby('date')['size'].rolling(k).mean().values
# add one business day to shift all the date 
dfb['date'] = dfb['date'] + pd.tseries.offsets.BDay() 
dfb = dfb.dropna().drop('price',1)
dfb['size'] = dfb['size'].astype(int) #needed for the merge_asof
print (dfb)

        date   size  avg_price
1 2018-08-02    105      232.5
2 2018-08-02    117      247.5
4 2018-08-03    115      220.0
5 2018-08-03    135      245.0

您可以使用merge_asof，通过date和nearest size（使用该方法需要sort_values）：

^{pr2}$

结果是dfa：

        date  price  size  avg_price
0 2018-08-01    220   100        NaN
1 2018-08-01    245   110        NaN
2 2018-08-01    250   125        NaN
3 2018-08-02    210   110      232.5
4 2018-08-02    230   120      247.5
5 2018-08-02    260   150      247.5
6 2018-08-03    200   115      220.0

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas数据帧在没有for循环的行上迭代

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >