Set不返回字母数字元组列表中的唯一元素为什么?(Python 3.6版)

2024-10-01 02:25:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在从一个pd.数据帧高于一定值。我将索引、列标题和值存储在一个元组中。然后将这些元组附加到列表中。我从中获取值的数据帧的布局意味着我提取每个元素两次,并且只需要存储每个组合一次。通过阅读前面的文章,set(list)应该给出这些独特的元素,但是在一个模拟数据集上,它应该产生一个结果('Pathway1','Pathway2',0.6),它报告了两个排列。你知道吗

有人知道为什么在这种情况下set不起作用吗?我知道列表必须是相同的,在我看来它们是相同的(甚至到每个元组组件的类型(string,string,float))。出于绝望,我试着将浮点值强制为字符串,但没有任何改进。你知道吗

为了完整起见,大部分代码都给出了(简化了一点)。底部的方块是问题产生的地方。代码如下:


#Import modules
import numpy as np
import pandas as pd

#Define trial sets

s1 = ["A", "B", "C", "D", "E"]
s2 = ["A", "B", "C"]
s3 = ["A", "B", "F"]
s4 = ["A", "B", "G", "H", "I"]
s5 = ["X", "Y", "Z"]

slist = [s1,s2,s3,s4,s5]

#Create an empty list to append results to
result1 = [] 

#Calculate Jaccard index between every entry
    #This is computationally inefficient as most computations are performed twice to generate a full results matrix to make mapping easy. Making half a matrix is more complicated but would be possible within the loop. Empty values would still have to be coded for though so in terms of storage of the final results matrix I don't think there should be much difference

for i in range(len(slist)):
    for j in range(len(slist)):
        result1.append(len(set(slist[i]).intersection(slist[j]))/len(set(slist[i]).union(slist[j])))

#Define result matrix dimensions
shape = (len(slist), len(slist))

#Convert list to array for numpy
rarray = np.array(result1)  

pathway_names = ["Pathway1", "Pathway2", "Pathway3", "Pathway4", "Pathway5"]

dataframe = pd.DataFrame(data = rmatrix, index = pathway_names, columns = pathway_names)

#List all pathways with Jaccard index > x unless PathwayName = PathwayName

x = 0.5
temp =[] #A temporary list for holding lists of tuples which will contain permutations

问题在于:

for k in range(len(slist)):
    index = dataframe.index[dataframe.iloc[k]>x]
    for l in range(len(index)):
        if index[l] != dataframe.columns[k]:
            temp.append((index[l], dataframe.columns[k], dataframe.iloc[l,k]))
print(set(temp))

我从打印temp得到的输出是

{('Pathway1', 'Pathway2', 0.6), ('Pathway2', 'Pathway1', 0.6)}

但我要求(按任何顺序):

('Pathway1', 'Pathway2', 0.6) 

谢谢你的帮助

安格斯


Tags: toindataframeforindexlenrangetemp
1条回答
网友
1楼 · 发布于 2024-10-01 02:25:15

问题是元组是有序的,因此('Pathway1', 'Pathway2', 0.6)不等于('Pathway2', 'Pathway1', 0.6)。你知道吗

要解决此问题,请将temp初始化为set并对任何元组排序,然后再将其添加到元组中。你知道吗

temp = set()
for ...:
    ...
    the_tuple = ...
    temp.add(tuple(sorted(the_tuple)))
print(temp)

相关问题 更多 >