有没有一种方法可以使用循环在Python中自动执行用户函数?

2024-09-28 19:24:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在尝试用Python实现下面的SAS代码,虽然我在将表转换为所需的代码时没有遇到任何问题,但我一直在通过每次更改参数手动执行该函数

SAS代码:

%LET SOURCEDATA =
PRACTICE.RURAL_SHOPS;

%MACRO DATAPROCESSOR(YEAR =, QUARTER =, COL1 =, COL2 =, KEYCOL =);
    PROC SQL;
    CREATE TABLE RURALSHOP&YEAR&QUARTER AS
        SELECT 
            &KEYCOL,
            SUM(SHOPNO) AS SHOPNO&YEAR&QUARTER,
            SUM(OPSHOP) AS OPSHOP&YEAR&QUARTER,
            SUM(CLSHOP) AS CLSHOP&YEAR&QUARTER
        FROM
            &SOURCEDATA
        WHERE 
            &COL1 = &YEAR AND &COL2 = &QUARTER
        GROUP BY 
            &KEYCOL;
    QUIT;
%MEND DATAPROCESSOR;

%MACRO REPEAT;
    %DO I = 2014% TO 2019;
    %DO J = 1% TO 4;
    %DATAPROCESSOR(YEAR = &I, QUARTER = &J, COL1 = YEAR, COL2 = QUARTER, KEYCOL = AREACODE);
    %END;
    %END;
%MEND;

%REPEAT;

有没有一种方法可以实现在Python中使用循环自动执行函数的部分?以下是我目前在Python中获得的内容:

sample_data = pd.DataFrame({"YEAR" : [2014, 2014, 2015, 2015, 2016, 2016, 2017, 2017, 2018, 2018, 2019, 2019], \
                         "QUARTER" : [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4], \
                         "AREACODE" : [100001, 100001, 100002, 100002, 100003, 100003, 100004, 100004, 100005, 100005, 100006, 100006], \
                         "SHOPNO" : [10, 12, 31, 5, 6, 9, 1, 3, 4, 0, 10, 2], \
                         "OPSHOP" : [1, 1, 3, 6, 2, 1, 0, 1, 0, 0, 2, 1], \
                         "CLSHOP" : [0, 0, 5, 1, 2, 0, 2, 1, 3, 5, 6, 0]})   
    
def createmaster(df, keycol, keycol1, keycol2, year, quarter):
        df = df[(df["{}".format(keycol1)] == year) & (df["{}".format(keycol2)] == quarter)].groupby("{}".format(keycol)).sum()
        df = df.rename(columns = {"SHOPNO" : "SHOPNO{}{}".format(year, quarter), 
                                  "OPSHOP" : "OPSHOPNO{}{}".format(year, quarter), 
                                  "CLSHOP" : "CLSHOP{}{}".format(year, quarter)})
        df = df.drop(columns = ["YEAR", "QUARTER"])
        return df

Tags: 代码formatdfasyearcol2col1quarter
1条回答
网友
1楼 · 发布于 2024-09-28 19:24:48

在SAS中,您应该在这里使用PROC MEANS语句运行CLASS,这将几乎不需要代码就可以生成所需的内容

这是通过一次传递数据,然后两次传递汇总数据来实现的,没有什么复杂的:只是一个汇总,一个快速传递来修复输出的名称,然后是转置。我在这里编写SAS只是为了说明Python也应该是怎样的。这使用了sashelp.prdsale,这是一个与您的数据集结构非常相似的数据集

proc means data=sashelp.prdsale nway;
  class country region year quarter;
  var actual predict;
  output out=prd_means sum(actual)= sum(predict)=;
run;

data prd_names;
  set prd_means;
  name = catx('_','actual',year,quarter);
  value = actual;
  output;
  name = catx('_','predict',year,quarter);
  value = predict;
  output;
  keep name value country region;
run;

proc transpose data=prd_names out=prd_vars;
  by country region;
  var value;
  id name;
run;

  

因此,在Python中,让我们用同样的方法来实现它——获取一个聚合摘要,然后根据需要对其进行重塑。我将使用saspy获取prdsale数据集;同样,您也可以在示例数据集上使用此代码

import pandas as pd

#Everything from here to ...
import saspy
sas = saspy.SASsession(cfgname='winiomIWA')
prdsale = sas.sasdata2dataframe(table='prdsale',libref='sashelp')
#...here is just getting the dataframe set up - use your own dataframe here.

sums = prdsale.groupby(['COUNTRY','REGION','YEAR','QUARTER']).agg ('sum').unstack(['YEAR','QUARTER'])
print(sums)

实际上,您所需要的只是最后一行(以及导入)的旁边:.groupby([... var list to summarize by, inc. year and quarter at the end]).agg('sum').unstack(['YEAR','QUARTER'])

这里,groupby告诉它在计算和时如何对数据进行分组,agg说的是对变量求和,unstack说的是在这两个变量之后对其进行重塑(意味着将这些变量的行转换为列)

我还怀疑pivot_table()也可以

pv = prdsale.pivot_table(index=['COUNTRY','REGION'],columns=['YEAR','QUARTER'],
                         values=['ACTUAL','PREDICT'],aggfunc='sum')

两者返回的速度基本相同,如果这在这里很重要,我不知道哪个更快。参见this question/answer了解另一个例子(尽管实际上没有进行任何聚合)

相关问题 更多 >