对不起,这里没有。 我正在与Anscombe的四重奏一起探索“脆弱”相关性的概念,方法是移除单个点(替换为组中位数),然后迭代数据以返回Pearson r和p值,然后为源向量中的每个项目绘制这两个值(Anscombe的四重奏就是灵感来源)。 迭代并替换单个值非常简单:
import numpy as np
import matplotlib.pyplot as plt
import itertools
import statistics
def new_list(x,y,n,replacex, replacey):
'''Take 2 1D arrays (x and y) and replace item n with replacex and replacey respectively'''
# First, copy the source arrays into the new arrays (newx, newy)
newx=np.copy(x)
newy=np.copy(y)
#Now replace item n with the medians
newx[n]=replacementx
newy[n]=replacementy
return(newx,newy)
#Initialise the dummy lists, assign the replacement values(medians), clear the temporary variables
newx=[] #temporary x list to run the new correlation
newy=[] #temporary y list to run the new correlation
p2values=[] #list of p values for the new correlations - this should change nearly every iteration
r2values=[] #list of r values for the new correlations - this should change nearly every iteration
replacementx=[] # single x value to be placed into the source list to run the new correlation. Currently using median
replacementy=[] # single y value to be placed into the source list to run the new correlation. Currently using median
#x,y values for one of Anscombe's Quartet as an example
x=[8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y=[6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
replacementx = statistics.median(x)
replacementy = statistics.median(y)
for n in range(len(x)):
newx,newy = new_list(x,y,n,replacementx,replacementy)
r,p = stats.pearsonr(x,y)
r2,p2 = stats.pearsonr(newx,newy)
p2values.append(p2)
r2values.append(r2)
newx=[]
newy=[]
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.set_xlabel('Item number')
ax1.set_ylabel('Pearson r', color=color)
ax1.set_ylim(0,1)
ax1.plot(range(len(r2values)), r2values, range(len(rvalues)),rvalues, color=color)
ax1.tick_params(axis='y', labelcolor=color)
ax2 = ax1.twinx() # instantiate a second axes that shares the same x-axis
color = 'tab:blue'
ax2.set_ylabel('p value', color=color) # we already handled the x-label with ax1
ax2.plot(range(len(p2values)),p2values, range(len(pvalues)), pvalues, color=color)
ax2.tick_params(axis='y', labelcolor=color)
plt.show()
然后我想推广一下,我可以用itertools.combines()来传递源数据(在本例中是Anscombe的四重奏)和我想测试的数据点组合的数量,看看相关性有多脆弱。我能得到的最远结果是创建“候选”数据点,以便从Anscombe的四重奏中删除,如下所示(对于2个数据点的所有组合):
import itertools
#x,y values for one of Anscombe's Quartet as an example
x=[8, 8, 8, 8, 8, 8, 8, 19, 8, 8, 8]
y=[6.58, 5.76, 7.71, 8.84, 8.47, 7.04, 5.25, 12.50, 5.56, 7.91, 6.89]
data=list(zip(x,y))
replacement_candidates=list(itertools.combinations(data,2))
print(replacement_candidates)
我想我现在需要map()通过简单的new_list函数返回结果列表,该函数运行相关性并返回结果Pearson r和p值,并将这些值附加到p2values[]和r2values[]列表中,但我在这里迷路了,非常感谢您的帮助。 提前感谢,, 杆
好的,我已经弄明白了,所以在这里发布代码,以防这对其他人有帮助。我完全走错了方向。 这里的示例传递了Anscombe四重奏的第三个成员,硬编码为x,y值和n=3(对于3个值的所有组合),但是您显然可以将其替换为您想要的任何值。n=1表示更换第8项时,该构件易碎(r,p评估为nan)。 sample output plot with n=1 我将把它重写为一个包含x、y和n的函数,但由于组合可能会很快失控,我想采取一些措施来防止内存不足错误和进度条之类的东西(这是我无法理解的,因为这基本上是我在Python中的“Hello World”) 结果与原始r和p的虚线参考线一起可视化,这对我很有帮助
相关问题 更多 >
编程相关推荐