Itertools Groupby给出了一个意外的结果

finalblobfpost1=['ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt','ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt','ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt','ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt'] keyf = lambda text: (re.findall("\w+\/\w+\/\d+\/(.*?)\_\d+_\d+.txt", text)+ [text])[0].strip() h=[list(items) for gr, items in groupby(sorted(finalblobfpost1), key=keyf)] print(h)

[['ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt', 'ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt'], ['ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt', 'ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt']]

finalblobfpost2=['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt','ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt','ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt','ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'] keyf1 = lambda text: (re.findall("\w+\/\w+\/\d+\/(.*?)\_\d+_\d+.txt", text)+ [text])[0].strip() h1=[list(items) for gr, items in groupby(sorted(finalblobfpost2), key=keyf1)] print(h1)

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt'], ['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt'], ['ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'], ['ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt']]

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt','ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'],['ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt','ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt']]

2条回答

网友

1楼 · 编辑于 2024-09-30 10:27:25

您的列表需要按照groupby中使用的相同键函数进行排序

试试这个：

h1=[list(items) for gr, items in groupby(sorted(finalblobfpost2, key=keyf1), key=keyf1)]

唯一的区别是对sorted的调用中的key=keyf1

输出（与预期相同）：

[['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt', 'ABC/XYZ/16082020/K1_AB_KIL_333333_16082020.txt'], ['ABC/XYZ/15082020/K1_SS_ALM_222222_15082020.txt', 'ABC/XYZ/16082020/K1_SS_ALM_111111_16082020.txt']]

这是在docs for ^{}中显式编写的：

The operation of groupby() is similar to the uniq filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function).

网友

2楼 · 编辑于 2024-09-30 10:27:25

试试这个

^{}

import re
from itertools import groupby

print(
    [list(v) for _, v in groupby(finalblobfpost1,
                                 key=lambda x: re.search("\w\d+_\w{2}_\w{3}", x).group())]
)

[['ABC/XYZ/16082020/K1_SS_ALM_222222_14082020.txt', 'ABC/XYZ/16082020/K1_SS_ALM_111111_14082020.txt'], ['ABC/XYZ/15082020/K1_AB_KIL_444444_15082020.txt', 'ABC/XYZ/15082020/K1_AB_KIL_333333_15082020.txt']]

相关问题更多 >

编程相关推荐

热门问题

热门文章