用于循环优化的Python

id days cluster 0 aaa 0 0 1 bbb 0 0 2 ccc 0 1 3 ddd 0 1 4 eee 0 0 5 fff 0 1 6 ggg 1 0 7 hhh 1 1 8 iii 1 0 9 lll 1 1 10 mmm 1 1 11 aaa 1 3 12 bbb 1 3

{('aaa', 'bbb'): [0, 3],('aaa', 'eee'): [0], ('bbb', 'eee'): [0], ('ccc', 'ddd'): [1], ('ccc', 'fff'): [1], ('ddd', 'fff'): [1], ('ggg', 'iii'): [0], ('hhh', 'lll'): [1], ('hhh', 'mmm'): [1], ('lll', 'mmm'): [1]}

y={} for i in range(0, max(df.iloc[:,1]) + 1): x = df.loc[df['days'] == i] for j in range(0,l en(x)): for z in range(1, len(x)): if (x.iloc[z,0], x.iloc[j,0]) in y: pass else: if (x.iloc[j,0], x.iloc[z,0]) not in y: if x.iloc[j,0] != x.iloc[z,0] and x.iloc[j,2] == x.iloc[z,2]: y[(x.iloc[j,0], x.iloc[z,0])] = [x.iloc[j,2]] else: if x.iloc[j,0] != x.iloc[z,0] and x.iloc[j,2] == x.iloc[z,2]: y[(x.iloc[j,0], x.iloc[z,0])].append(x.iloc[j,2])

2条回答

网友

1楼 · 编辑于 2024-10-02 04:16:41

考虑到瓶颈是获得ID的组合，为什么不把它留到最后呢

按id对数据进行分组，每个id将显示一组“存储箱”（日、群集），在这些存储箱中可以找到数据：

grouped = collections.defaultdict(set)
for index, (id_, day, cluster) in df.iterrows():
    grouped[id_].add((day, cluster))

对于找到的每个bin组合，列出属于每个bin的id：

binned = collections.defaultdict(list)
for id_, bins in grouped.items():
    binned[tuple(sorted(bins))].append(id_)

如果您需要，请仅通过群集进行简化：

clustered = collections.defaultdict(list)
for bins, ids in binned.items():
    clusters = set(cluster for (day, cluster) in bins)
    clustered[tuple(sorted(clusters))].extend(ids)

最后，获取每个集群箱的ID组合应该不是问题：

for bins, ids in clustered.items():
    if len(ids) > 1:
        for comb_id in itertools.combinations(ids, 2):
            print(bins, comb_id) 
            # or do other stuff with it

网友

2楼 · 编辑于 2024-10-02 04:16:41

您可以利用pandas.DataFrame.groupby方法：

result = collections.defaultdict(list)

for (day, cluster), group in df.groupby(["days", "cluster"]):
    for comb in itertools.combinations(df["id"][group.index], 2):
        result[comb].append(cluster)

这将为您提供所需的结果：

defaultdict(<class 'list'>, {('aaa', 'bbb'): [0, 3], ('aaa', 'eee'): [0], ('bbb', 'eee'): [0], ('ccc', 'ddd'): [1], ('ccc', 'fff'): [1], ('ddd', 'fff'): [1], ('ggg', 'iii'): [0], ('hhh', 'lll'): [1], ('hhh', 'mmm'): [1], ('lll', 'mmm'): [1]})

相关问题更多 >

编程相关推荐

热门问题

热门文章