在python中创建摘要统计表

2024-06-27 02:33:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在python中从R重新创建“SummarySE()”函数,但在使其工作时遇到了困难。该函数从重复测量数据帧创建摘要统计表。 然而,我无法让它工作,我不断地得到错误,因为我的数据框中的列名(是字符串)

使用的表格:

^{tb1}$
df = pd.DataFrame(columns=["id", "Position.Name", "Period", "Maximum.Velocity"], 
                  data = [[2, "WR", "Special team", 16.5],[2, "WR", "Special team", 15.2], [2, "WR", "Special team", 16.5], [2,"WR", "Special team", 15.2],  [3, "DB", "Special team" ,14.5],[3, "DB", "Special team", 10.6], [3, "DB", "Special team", 17.5],[3, "DB", "Special team", 13.5], [4, "OL", "Special team", 10.2], [4, "OL", "Special team", 11.3], [4, "OL", "Special team", 16.2], [2, "WR", "team", 13.5], [2, "WR", "team", 12.2], [2, "WR", "team", 15.5],[2, "WR", "team", 16.2],[3, "DB", "team", 13.5], [3, "DB", "team", 12.5], [3, "DB", "team", 11.5], [3,"DB","team", 16.5], [4, "OL","team", 9.2], [4, "OL", "team", 8.2], [4, "OL", "team", "11.2"]])
df["Maximum.Velocity"] = df["Maximum.Velocity"].astype("float")

使用的代码:

import pandas as pd
import scipy as sp
from scipy.stats import t
import numpy as np

#from: http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_%28ggplot2%29/
## Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
##   data: a data frame.
##   measurevar: the name of a column that contains the variable to be summariezed
##   groupvars: a vector containing names of columns that contain grouping variables
##   conf_interval: the percent range of the confidence interval (default is 95%)
def summarySE(data, measurevar, groupvars, conf_interval=0.95):
    def std(s):
        return np.std(s, ddof=1)
    def stde(s):
        return std(s) / np.sqrt(len(s))

    def ci(s):
        # Confidence interval multiplier for standard error
        # Calculate t-statistic for confidence interval: 
        # e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
        ciMult = t.ppf(conf_interval/2.0 + .5, len(s)-1)
        return stde(s)*ciMult
    def ciUp(s):
        return np.mean(s)+ci(s)
    def ciDown(s):
        return np.mean(s)-ci(s)
    
    data = data[groupvars+measurevar].groupby(groupvars).agg([len, np.mean, std, stde, ciUp, ciDown, ci])

    data.reset_index(inplace=True)


    data.columns = groupvars+ ['_'.join(col).strip() for col in data.columns.values[len(groupvars):]]

    return data

summary_table = summarySE(data = df, measurevar = ['Maximum.Velocity'], groupvars = ['Position.Name','Period'], conf_interval=0.95)

我得到的回溯错误:

  • indexer=self.columns.get_loc(键)
  • 从err中升起钥匙错误(钥匙)
  • KeyError:'Position.NameMaximum.Velocity'

所需的输出如下所示:

^{tb2}$

Tags: columnsdfdbdatareturndefnpmean