如何对数据磁带上的元素进行分组?

2024-10-04 03:27:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一系列的刺痛,比如:

"aa", "lot", "bb", "obj", "obj", "obj", "cc", "lot", "obj", "gg", "lot", "obj", "obj"

我需要计算每个lot有多少obj。结果应该是:

[lot 3 obj] [lot 1 obj] [lot 2 obj]

不仅仅是:

[3, 1, 2]

或者像这样的东西

字符串有一些垃圾-除了lotobj之外的任何其他符号。Delemeter正在启动新的lot


Tags: 字符串obj符号垃圾aacclotbb
3条回答

您可以使用enumerate获取lot位置,然后计算子列表中的obj

lst = ["aa", "lot", "bb", "obj", "obj", "obj", "cc", "lot", "obj", "gg", "lot", "obj", "obj"]
lot = [i for i, x in enumerate(lst) if x == "lot"]
obj = [lst[a:b].count("obj") for a, b in zip(lot, lot[1:] + [len(lst)])]
print(obj) # [3, 1, 2]

或者首先从列表中删除“垃圾”,然后您不需要子列表和count之后:

lst = [x for x in lst if x in ("lot", "obj")]
lot = [i for i, x in enumerate(lst) if x == "lot"]
obj = [b - a - 1 for a, b in zip(lot, lot[1:] + [len(lst)])]

(两者都不会计算第一个lot之前的任何obj,而是在最后一个之后。)

使用OrderedDict的一种方法:

from collections import OrderedDict

d = OrderedDict()
for n, i in enumerate(l):
    if i == "lot":
        d[n] = [i]
    elif i == "obj":
        d[max(d)].append(i)
list(d.values())

输出:

[['lot', 'obj', 'obj', 'obj'], ['lot', 'obj'], ['lot', 'obj', 'obj']]

根据您的意见:

inp = ["aa", "lot", "bb", "obj", "obj", "obj", "cc", "lot", "obj", "gg", "lot", "obj", "obj"]

两种方法:

  • 一个可读性很好的生成器函数:

def group_lots(inp):
    count = 0
    seen_lot = False
    for item in inp:
        if item == "obj":
            count += 1
        if item == "lot":
            if seen_lot:
                yield count
            count = 0
            seen_lot = True
    if count:
        yield count

print(list(group_lots(inp)))  # [3, 1, 2]

  • 或无法阅读的神秘魔法itertools.groupby表达式:
import itertools

obj_counts = [
    len(list(group_contents))
    for is_lot, group_contents in itertools.groupby(
        (item for item in inp if item in ("lot", "obj")),
        lambda i: i == "lot",
    )
    if not is_lot
]
print(obj_counts)  # [3, 1, 2]

相关问题 更多 >