定义多元正态分布的范围

2024-09-29 01:19:55 发布

您现在位置:Python中文网/ 问答频道 /正文


我有一个780x6形状的pandas.DataFrame,其中3列有二进制值('treatment', 'married', 'nodegree')和3个浮点值(列3:6)。 我想对三个非二进制列进行蒙特卡罗模拟。 因此,我首先创建所有可能的索引变体,以便稍后执行MC模拟:

index000 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index 
index001 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index010 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index100 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 0) ].index
index011 = X_MC[ ( X_MC['treatment'] == 0 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index
index110 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 0) ].index
index101 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 0 ) & (X_MC['nodegree'] == 1) ].index
index111 = X_MC[ ( X_MC['treatment'] == 1 ) & ( X_MC['married'] == 1 ) & (X_MC['nodegree'] == 1) ].index

然后计算所有三个非二进制列的平均值mean()和协方差cov()

mean000 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean001 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean010 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean100 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean011 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean110 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean101 = X_MC.iloc[ index000, 3 : 6 ].mean()
mean111 = X_MC.iloc[ index000, 3 : 6 ].mean()
cov000 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov001 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov010 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov100 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov011 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov110 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov101 = X_MC.iloc[ index000, 3 : 6 ].cov()
cov111 = X_MC.iloc[ index000, 3 : 6 ].cov()

从前一步获得的结果用于定义MC模拟的分布:

df_MC.iloc[ index000, 3 : 6 ] = np.random.multivariate_normal( mean000, cov000, len( index000 ) )
df_MC.iloc[ index001, 3 : 6 ] = np.random.multivariate_normal( mean001, cov001, len( index001 ) )
df_MC.iloc[ index010, 3 : 6 ] = np.random.multivariate_normal( mean010, cov010, len( index010 ) )
df_MC.iloc[ index100, 3 : 6 ] = np.random.multivariate_normal( mean100, cov100, len( index100 ) )
df_MC.iloc[ index011, 3 : 6 ] = np.random.multivariate_normal( mean011, cov011, len( index011 ) )
df_MC.iloc[ index110, 3 : 6 ] = np.random.multivariate_normal( mean110, cov110, len( index110 ) )
df_MC.iloc[ index101, 3 : 6 ] = np.random.multivariate_normal( mean101, cov101, len( index101 ) )
df_MC.iloc[ index111, 3 : 6 ] = np.random.multivariate_normal( mean111, cov111, len( index111 ) )

不幸的是,np.random.multivariate_normal只允许用户定义向量的平均值、cov和长度。因此,您无法控制随机生成的值,例如,它们必须在一定范围内。
但是,我想根据每个列的经验值为生成的分布设置一个最小值和最大值。
因此,除了定义的平均值、cov和长度外,分布值也不应大于例如50且小于17。 这不仅需要定义np.random.multivariate_normal的所有值必须在其中的范围,而且还需要分别为每个非二进制列定义该范围。因此,第二列的最大值应该是12而不是50,最小值应该是2而不是17


是否有可能实现我的意图


Tags: dfindexlennprandommcmeancov