Groupby和filter数据集

2024-10-02 18:27:59 发布

您现在位置:Python中文网/ 问答频道 /正文

df  fruit   year price  vol  signifiance
0   apple   2010  1      5 
1   apple   2011  2      4   
2   apple   2012  3      3   
3   apple   2013  3      3   
4   apple   2014  3      3   
5   apple   2015  3      3   important
...
47   banana  2010  1      4

如果一个水果年是重要的,我想使用该重要水果年前后5年的数据回归vol上的价格

例如,对于苹果公司,从2010年到2020年,将价格按体积进行回归

我试过:

df = df.groupby('significance')
Y = df['price']
X = df['vol']
model = sm.OLS(Y,X)

Tags: 数据appledf体积价格yearpricebanana
1条回答
网友
1楼 · 发布于 2024-10-02 18:27:59

我相信你需要:

import statsmodels.api as sm

g = df.groupby('fruit')
for group in g.groups.keys():
    df1 = g.get_group(group)
    #filter years with important rows
    years = df1.loc[df1['signifiance'].eq('important'), 'year']
    print (years)
    
    #for each year get get years between 5 previous and 5 next years
    for year in years:
        data = df1[df1['year'].between(year - 5, year + 5)]
        print (data)

        #if returned data processing
        if not data.empty:
            X = data['vol'] 
            Y = data['price']
            model = sm.OLS(Y, X)
            results = model.fit()
            print (results.summary())

编辑:

import statsmodels.api as sm


def f(df1):
    m1 = df1['signifiance'].eq('important')
    years = df1.loc[m1, 'year']
    print (years)
    
    #for each year get get years between 5 previous and 5 next years
    for year in years:
        mask = df1['year'].between(year - 5, year + 5) & df1['vol'].notna() & df1['price'].notna()
        data = df1[mask] 
        # print (data)

        #if returned data processing
        if not data.empty:
            X = data['vol'] 
            Y = data['price']
            model = sm.OLS(Y, X)
            results = model.fit()
            # print (results.params)
            df1.loc[mask & m1, 'new'] = results.params.iat[0]
    return df1


df = df.groupby('fruit').apply(f)
print (df)

相关问题 更多 >