在一个大的累积列表中发现小列表的多次重复

2024-05-06 21:56:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个很大的字符串列表,例如:

full_log = ['AB21','BG54','HG89','NS72','Error','CF54','SD62','KK02','FE34']

以及多个小字符串列表,例如:

tc1 = ['HG89','NS72']
tc2 = ['AB21','BG54']
tc3 = ['KK02','FE34']
tc4 = ['CF54','SD62']

我想在维护序列的大列表中找到每个小列表,这样输出会是这样的:

tc2-tc1-Er-tc4-tc3

我想知道有没有什么直截了当的,像Python一样的方法来处理这种情况。你知道吗


Tags: 字符串列表fulltc1tc2tc4tc3ab21
3条回答

如果所有短列表长度相等,您可以创建一个dict,其中key是tuple字符串,value是其中一个标签。你可以穿过full_log,取一个长度合适的块,看看是否可以从dict找到它。你知道吗

如果短列表的长度不同,则上述方法将不起作用,因为从full_log获取的块长度不是常量。在这种情况下,一种可能的解决方案是将短列表中的项添加到树结构中,其中叶节点是一个标签。然后针对full_log中的每个索引,查看是否可以从树中找到路径。如果找到路径,则向前跳转路径长度,否则从下一个索引开始尝试:

from collections import defaultdict
from itertools import islice

full_log = ['AB21','BG54','HG89','NS72','Error','CF54','SD62','KK02','FE34']

# Construct a tree
dd = lambda: defaultdict(dd)
labels = defaultdict(dd)
labels['HG89']['NS72'] = 'tc1'
labels['AB21']['BG54'] = 'tc2'
labels['KK02']['FE34'] = 'tc3'
labels['CF54']['SD62'] = 'tc4'

# Find label, return tuple (label, length) or (None, 1)
def find_label(it):
    length = 0
    node = labels
    while node and isinstance(node, dict):
        node = node.get(next(it, None))
        length += 1

    return node, (length if node else 1)

i = 0
result = []
while i < len(full_log):
    label, length = find_label(islice(full_log, i, None))
    result.append(label if label else full_log[i])
    i += length

print result # ['tc2', 'tc1', 'Error', 'tc4', 'tc3']

上面使用的树有点像trie,但节点可以包含子节点或值(标签)。你知道吗

可以使用Set进行模式匹配:

from sets import Set
full_log = ['AB21','BG54','HG89','NS72','Error','CF54','SD62','KK02','FE34']

tc1 = ['HG89','NS72']
tc2 = ['AB21','BG54']
tc3 = ['KK02','FE34']
tc4 = ['CF54','SD62']

set(full_log) & set(tc1)

输出:{'HG89', 'NS72'}

#Finding index of set elements:

result=set(full_log) & set(tc1)



def all_indices(value, qlist):
    indices = []
    idx = -1
    while True:
        try:
            idx = qlist.index(value, idx+1)
            indices.append(idx)
        except ValueError:
            break
    return indices


r=[]
for i in range(len(result)):
   s=all_indices(list(result)[i], full_log)
   r.append(s)

r
Output: [[2], [3]]

您需要为小列表中的元素创建一个映射(字典):

m = {k: v for k, v in zip(map(tuple, [tc1, tc2, tc3, tc4])), ["tc1", "tc2", "tc3", "tc4"])}
>>> {('KK02', 'FE34'): 'tc3', ('AB21', 'BG54'): 'tc2', ('CF54', 'SD62'): 'tc4', ('HG89', 'NS72'): 'tc1'}

然后可以使用迭代器在列表上循环:

itr = iter(full_log)

for i in itr:
    if i != "Error":
        n = next(itr)
        if n != "Error":
            if (i, n) in m:
                print m[(i, n)]
        else:
            print "Er"
    else:
        print "Er"



>>> tc2
    tc1
    Er
    tc4
    tc3

如果您不介意扩展第一个列表中的“错误”条目:

full_log2 = [item for sublist in [[i] if i != "Error" else ["Error", "Error"] for i in full_log] for item in sublist]
>>> ['AB21', 'BG54', 'HG89', 'NS72', 'Error', 'Error', 'CF54', 'SD62', 'KK02', 'FE34']

然后你可以使用列表理解:

print [m[(full_log2[i], full_log2[i+1])] if (full_log2[i], full_log2[i+1]) in m else "Er" for i in range(0, len(full_log2)-1, 2)]
>>> ['tc2', 'tc1', 'Er', 'tc4', 'tc3']

相关问题 更多 >