我试图在python中从R重新创建“SummarySE()”函数,但在使其工作时遇到了困难。该函数从重复测量数据帧创建摘要统计表。 然而,我无法让它工作,我不断地得到错误,因为我的数据框中的列名(是字符串)
使用的表格:
df = pd.DataFrame(columns=["id", "Position.Name", "Period", "Maximum.Velocity"],
data = [[2, "WR", "Special team", 16.5],[2, "WR", "Special team", 15.2], [2, "WR", "Special team", 16.5], [2,"WR", "Special team", 15.2], [3, "DB", "Special team" ,14.5],[3, "DB", "Special team", 10.6], [3, "DB", "Special team", 17.5],[3, "DB", "Special team", 13.5], [4, "OL", "Special team", 10.2], [4, "OL", "Special team", 11.3], [4, "OL", "Special team", 16.2], [2, "WR", "team", 13.5], [2, "WR", "team", 12.2], [2, "WR", "team", 15.5],[2, "WR", "team", 16.2],[3, "DB", "team", 13.5], [3, "DB", "team", 12.5], [3, "DB", "team", 11.5], [3,"DB","team", 16.5], [4, "OL","team", 9.2], [4, "OL", "team", 8.2], [4, "OL", "team", "11.2"]])
df["Maximum.Velocity"] = df["Maximum.Velocity"].astype("float")
使用的代码:
import pandas as pd
import scipy as sp
from scipy.stats import t
import numpy as np
#from: http://www.cookbook-r.com/Graphs/Plotting_means_and_error_bars_%28ggplot2%29/
## Gives count, mean, standard deviation, standard error of the mean, and confidence interval (default 95%).
## data: a data frame.
## measurevar: the name of a column that contains the variable to be summariezed
## groupvars: a vector containing names of columns that contain grouping variables
## conf_interval: the percent range of the confidence interval (default is 95%)
def summarySE(data, measurevar, groupvars, conf_interval=0.95):
def std(s):
return np.std(s, ddof=1)
def stde(s):
return std(s) / np.sqrt(len(s))
def ci(s):
# Confidence interval multiplier for standard error
# Calculate t-statistic for confidence interval:
# e.g., if conf.interval is .95, use .975 (above/below), and use df=N-1
ciMult = t.ppf(conf_interval/2.0 + .5, len(s)-1)
return stde(s)*ciMult
def ciUp(s):
return np.mean(s)+ci(s)
def ciDown(s):
return np.mean(s)-ci(s)
data = data[groupvars+measurevar].groupby(groupvars).agg([len, np.mean, std, stde, ciUp, ciDown, ci])
data.reset_index(inplace=True)
data.columns = groupvars+ ['_'.join(col).strip() for col in data.columns.values[len(groupvars):]]
return data
summary_table = summarySE(data = df, measurevar = ['Maximum.Velocity'], groupvars = ['Position.Name','Period'], conf_interval=0.95)
我得到的回溯错误:
所需的输出如下所示:
目前没有回答
相关问题 更多 >
编程相关推荐