python:如何将列表合并到集群中?

2024-10-02 00:20:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个元组列表:

[(3,4), (18,27), (4,14)]

并需要一个代码合并元组,它有重复的数字,使另一个列表中所有的列表元素只包含唯一的数字。列表应按元组的长度排序,即:

^{pr2}$

我知道这和层次聚类算法很相似,我读过,但却搞不懂。在

MergeThat()函数是否有相对简单的代码?在


Tags: 函数代码算法元素列表排序数字聚类
3条回答
import itertools

def merge_it(lot):
    merged = [ set(x) for x in lot ] # operate on sets only
    finished = False
    while not finished:
        finished = True
        for a, b in itertools.combinations(merged, 2):
            if a & b:
                # we merged in this iteration, we may have to do one more
                finished = False
                if a in merged: merged.remove(a)
                if b in merged: merged.remove(b)    
                merged.append(a.union(b))
                break # don't inflate 'merged' with intermediate results
    return merged

if __name__ == '__main__':
    print merge_it( [(3,4), (18,27), (4,14)] )
    # => [set([18, 27]), set([3, 4, 14])]

    print merge_it( [(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)] )
    # => [set([21, 15]), set([1, 10, 3]), set([57, 66, 76, 85])]

    print merge_it( [(1,2), (2,3), (3,4), (4,5), (5,9)] )
    # => [set([1, 2, 3, 4, 5, 9])]

以下是一个片段(包括doctest):http://gist.github.com/586252

def collapse(L):
    """ The input L is a list that contains tuples of various sizes.
        If any tuples have shared elements, 
        exactly one instance of the shared and unshared elements are merged into the first tuple with a shared element.
        This function returns a new list that contain merged tuples and an int that represents how many merges were performed."""
    answer = []
    merges = 0
    seen = []   # a list of all the numbers that we've seen so far
    for t in L:
        tAdded = False
        for num in t:
            pleaseMerge = True
            if num in seen and pleaseMerge:
                answer += merge(t, answer)
                merges += 1
                pleaseMerge = False
                tAdded= True
            else:
                seen.append(num)
        if not tAdded:
            answer.append(t)

    return (answer, merges)

def merge(t, L):
    """ The input L is a list that contains tuples of various sizes.
        The input t is a tuple that contains an element that is contained in another tuple in L.
        Return a new list that is similar to L but contains the new elements in t added to the tuple with which t has a common element."""
    answer = []
    while L:
        tup = L[0]
        tupAdded = False
        for i in tup:
            if i in t:
                try:
                    L.remove(tup)
                    newTup = set(tup)
                    for i in t:
                        newTup.add(i)
                    answer.append(tuple(newTup))
                    tupAdded = True
                except ValueError:
                    pass
        if not tupAdded:
            L.remove(tup)
            answer.append(tup)
    return answer

def sortByLength(L):
    """ L is a list of n-tuples, where n>0.
        This function will return a list with the same contents as L 
        except that the tuples are sorted in non-ascending order by length"""

    lengths = {}
    for t in L:
        if len(t) in lengths.keys():
            lengths[len(t)].append(t)
        else:
            lengths[len(t)] = [(t)]

    l = lengths.keys()[:]
    l.sort(reverse=True)

    answer = []
    for i in l:
        answer += lengths[i]
    return answer

def MergeThat(L):
    answer, merges = collapse(L)
    while merges:
        answer, merges = collapse(answer)
    return sortByLength(answer)

if __name__ == "__main__":
    print 'starting'
    print MergeThat([(3,4), (18,27), (4,14)])
    # [(3, 4, 14), (18, 27)]
    print MergeThat([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
    # [(57, 66, 76, 85), (1, 10, 3), (15, 21)]

我努力想弄明白,但只有在我尝试了伊恩的答案后(谢谢!)建议我认识到理论上的问题是:输入是一个边的列表并定义一个图。我们正在寻找这个图的强连通分量。就这么简单。在

虽然您可以do this efficiently,但实际上没有理由自己实现它!只需导入good graph library

import networkx as nx

# one of your examples
g1 = nx.Graph([(1,3), (15,21), (1,10), (57,66), (76,85), (66,76)])
print nx.connected_components(g1) # [[57, 66, 76, 85], [1, 10, 3], [21, 15]]

# my own test case
g2 =  nx.Graph([(1,2),(2,10), (20,3), (3,4), (4,10)])
print nx.connected_components(g2) # [[1, 2, 3, 4, 10, 20]]

相关问题 更多 >

    热门问题