将列表中具有相同子字符串(Python中为4个字母)的字符串分组

2024-06-28 20:32:37 发布

您现在位置:Python中文网/ 问答频道 /正文

这是一个示例列表:

["aaaa", "cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd" , "aaaab" , "aaaa".....]

导出的输出应如下所示:

[("fish","ffish") , ("bird","birdd"), ("aaaa","aaaab","aaaa") ....]

或所有可能的双重匹配:

[("fish","ffish"),("ffish","fish"),("bird","birdd"), ("birdd","bird"),("aaaa","aaaab"),("aaaa","aaaa"),("aaaab","aaaa") ....] 

Tags: 示例列表catdogfish双重aaaabird
3条回答

如果模式保持不变,则可以使用zip和列表理解

print([(x, y) for x, y in zip(lst[0::2], lst[1::2]) if x in y and len(x)==4])

另一种方式

lst = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]

print([(lst[x], lst[x+1]) for x in range(0, len(lst), 2) if len(lst[x]) == 4 and lst[x] in lst[x+1]])

输出:

[('fish', 'ffish'), ('bird', 'birdd')]

您可以将filter理解组合在一起以获得所需的结果

>>> data = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]
>>> list(filter(lambda i: len(i)>=2,[tuple(x
                                       for x in data if item in x)
                                 for item in filter(lambda i: len(i) == 4, data)]))

#output: [('fish', 'ffish'), ('bird', 'birdd')]

这适用于小列表(因为时间复杂):

lst = ["aaaa", "cat", "ccaatt", "fish", "ffish", "dog", "doog", "bird",
       "birdd", "aaaab", "aaaa", 'fourrr', 'four']

lenght_four = {}
more_than_four = []

for item in lst:

    if len(item) > 4:
        more_than_four.append(item)

    elif len(item) == 4:
        exist = lenght_four.get(item)
        if exist is not None:
            exist.append(item)
        else:
            lenght_four[item] = []

for item in more_than_four:
    for k, v in lenght_four.items():
        if k in item:
            v.append(item)

res = [(k, *v) for k, v in lenght_four.items() if v]
print(res)

输出:

[('aaaa', 'aaaa', 'aaaab'), ('fish', 'ffish'), ('bird', 'birdd'), ('four', 'fourrr')]

通过迭代列表,我们可以一次性完成这些任务:(感谢@VPfB)

1-不包括小于4的项目

2-在字典中添加4个长度的项目

3-添加在单独列表中具有len(item) > 4的其他内容

然后,我们对具有len(item) > 4的项进行迭代,以检查4长度列表中的项是否是它们的子字符串

最后我们得到lenght_four字典中的项,它们的值不是空列表

相关问题 更多 >