熊猫相关性：列列表X整个数据框

网友

1楼 · 编辑于 2024-06-14 01:49:19

列出所需的子集（在本例中是a、B和C），创建一个空数据框，然后使用嵌套循环将所需值填充到其中。

df = pd.DataFrame(np.random.randn(50, 7), columns=list('ABCDEFG'))

# initiate empty dataframe
corr = pd.DataFrame()
for a in list('ABC'):
    for b in list(df.columns.values):
        corr.loc[a, b] = df.corr().loc[a, b]

corr
Out[137]: 
          A         B         C         D         E         F         G
A  1.000000  0.183584 -0.175979 -0.087252 -0.060680 -0.209692 -0.294573
B  0.183584  1.000000  0.119418  0.254775 -0.131564 -0.226491 -0.202978
C -0.175979  0.119418  1.000000  0.146807 -0.045952 -0.037082 -0.204993

sns.heatmap(corr)

网友

2楼 · 编辑于 2024-06-14 01:49:19

经过昨晚的努力，我得到了以下答案：

#datatable imported earlier as 'data'
#Create a new dictionary
plotDict = {}
# Loop across each of the two lists that contain the items you want to compare
for gene1 in list_1:
    for gene2 in list_2:
        # Do a pearsonR comparison between the two items you want to compare
        tempDict = {(gene1, gene2): scipy.stats.pearsonr(data[gene1],data[gene2])}
        # Update the dictionary each time you do a comparison
        plotDict.update(tempDict)
# Unstack the dictionary into a DataFrame
dfOutput = pd.Series(plotDict).unstack()
# Optional: Take just the pearsonR value out of the output tuple
dfOutputPearson = dfOutput.apply(lambda x: x.apply(lambda x:x[0]))
# Optional: generate a heatmap
sns.heatmap(dfOutputPearson)

与其他答案类似，这将生成一个热图（见下文），但可以缩放该热图以允许20000x30矩阵，而无需计算整个20000x2000组合之间的相关性（因此终止速度更快）。

网友

3楼 · 编辑于 2024-06-14 01:49:19

通常所有变量的相关系数成对计算最有意义。corr（）是计算相关系数对（和所有对）的方便函数。也只能对循环中指定的对使用scipy。

示例：

d=pd.DataFrame([[1,5,8],[2,5,4],[7,3,1]], columns=['A','B','C'])

一对熊猫可能是：

d.corr().loc['A','B']

-0.98782916114726194

在scipy中等效：

import scipy.stats
scipy.stats.pearsonr(d['A'].values,d['B'].values)[0]

-0.98782916114726194

相关问题更多 >

编程相关推荐

热门问题

热门文章

熊猫相关性：列列表X整个数据框

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >