我有一个780x6形状的pandas.DataFrame,其中3列有二进制值('treatment', 'married', 'nodegree')
和3个浮点值(列3:6)。
我想对三个非二进制列进行蒙特卡罗模拟。
因此,我首先创建所有可能的索引变体,以便稍后执行MC模拟:
index000 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index
index001 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index010 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index100 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index
index011 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index
index110 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index101 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index111 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index
然后计算所有三个非二进制列的平均值mean()
和协方差cov()
:
mean000 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean001 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean010 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean100 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean011 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean110 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean101 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean111 = X_MC.iloc[ index000, 3 : 6 ].mean()
cov000 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov001 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov010 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov100 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov011 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov110 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov101 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov111 = X_MC.iloc[ index000, 3 : 6 ].cov()
从前一步获得的结果用于定义MC模拟的分布:
df_MC.iloc[ index000, 3 : 6 ] = np.random.multivariate_normal( mean000, cov000, len( index000 ) )
df_MC.iloc[ index001, 3 : 6 ] = np.random.multivariate_normal( mean001, cov001, len( index001 ) )
df_MC.iloc[ index010, 3 : 6 ] = np.random.multivariate_normal( mean010, cov010, len( index010 ) )
df_MC.iloc[ index100, 3 : 6 ] = np.random.multivariate_normal( mean100, cov100, len( index100 ) )
df_MC.iloc[ index011, 3 : 6 ] = np.random.multivariate_normal( mean011, cov011, len( index011 ) )
df_MC.iloc[ index110, 3 : 6 ] = np.random.multivariate_normal( mean110, cov110, len( index110 ) )
df_MC.iloc[ index101, 3 : 6 ] = np.random.multivariate_normal( mean101, cov101, len( index101 ) )
df_MC.iloc[ index111, 3 : 6 ] = np.random.multivariate_normal( mean111, cov111, len( index111 ) )
不幸的是,np.random.multivariate_normal
只允许用户定义向量的平均值、cov和长度。因此,您无法控制随机生成的值,例如,它们必须在一定范围内。
但是,我想根据每个列的经验值为生成的分布设置一个最小值和最大值。
因此,除了定义的平均值、cov和长度外,分布值也不应大于例如50且小于17。
这不仅需要定义np.random.multivariate_normal
的所有值必须在其中的范围,而且还需要分别为每个非二进制列定义该范围。因此,第二列的最大值应该是12而不是50,最小值应该是2而不是17
是否有可能实现我的意图
目前没有回答
相关问题 更多 >
编程相关推荐