定义多元正态分布的范围

2024-09-29 01:19:55 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一个780x6形状的pandas.DataFrame，其中3列有二进制值('treatment', 'married', 'nodegree')和3个浮点值（列3:6）。我想对三个非二进制列进行蒙特卡罗模拟。因此，我首先创建所有可能的索引变体，以便稍后执行MC模拟：

index000 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index 
index001 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index010 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index100 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index
index011 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index
index110 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index101 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index111 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index

然后计算所有三个非二进制列的平均值mean()和协方差cov()：

mean000 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean001 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean010 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean100 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean011 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean110 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean101 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean111 = X_MC.iloc[ index000, 3 : 6 ].mean()
cov000 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov001 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov010 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov100 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov011 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov110 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov101 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov111 = X_MC.iloc[ index000, 3 : 6 ].cov()

从前一步获得的结果用于定义MC模拟的分布：

df_MC.iloc[ index000, 3 : 6 ] = np.random.multivariate_normal( mean000, cov000, len( index000 ) )
df_MC.iloc[ index001, 3 : 6 ] = np.random.multivariate_normal( mean001, cov001, len( index001 ) )
df_MC.iloc[ index010, 3 : 6 ] = np.random.multivariate_normal( mean010, cov010, len( index010 ) )
df_MC.iloc[ index100, 3 : 6 ] = np.random.multivariate_normal( mean100, cov100, len( index100 ) )
df_MC.iloc[ index011, 3 : 6 ] = np.random.multivariate_normal( mean011, cov011, len( index011 ) )
df_MC.iloc[ index110, 3 : 6 ] = np.random.multivariate_normal( mean110, cov110, len( index110 ) )
df_MC.iloc[ index101, 3 : 6 ] = np.random.multivariate_normal( mean101, cov101, len( index101 ) )
df_MC.iloc[ index111, 3 : 6 ] = np.random.multivariate_normal( mean111, cov111, len( index111 ) )

不幸的是，np.random.multivariate_normal只允许用户定义向量的平均值、cov和长度。因此，您无法控制随机生成的值，例如，它们必须在一定范围内。
但是，我想根据每个列的经验值为生成的分布设置一个最小值和最大值。
因此，除了定义的平均值、cov和长度外，分布值也不应大于例如50且小于17。这不仅需要定义np.random.multivariate_normal的所有值必须在其中的范围，而且还需要分别为每个非二进制列定义该范围。因此，第二列的最大值应该是12而不是50，最小值应该是2而不是17

是否有可能实现我的意图

Tags： df index len np random mc mean cov

0条回答

目前没有回答

定义多元正态分布的范围

相关问题更多 >

编程相关推荐

热门问题

热门文章

定义多元正态分布的范围

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >