Python中的过滤优化

1条回答

网友

1楼 · 发布于 2024-09-30 01:20:11

首先，结合两个条件很容易：

for clump1 in stored.clumps:
    for clump2 in stored.clumps:
        if clump1.classification == clump2.classification and clump1.can_clump(clump2):
            #some code here

如果第一个测试失败，can_clump将不会被调用。你知道吗

第二，一般来说，filter在需要lambda来实现它时速度较慢；只有当谓词本身是用C实现的内置函数时，您才会看到有意义的节省。如果您已经需要调用一个现有的、Python定义的函数，那么filter通常不会更好或更差，因此使用它几乎没有什么坏处。你知道吗

因此，对于您的情况，假设classification是一个内置类型（或一个C扩展实现的类型），您可能能够通过以下方式优化位：

for clump1 in stored.clumps:
    for clump2 in filter(clump1.can_clump, filter(clump1.classification.__eq__, stored.clumps)):
          #some code here

也就是说，这都是微观优化。如果这是代码中最热门的部分，并且一切都正常，那么即使它有效，我们谈论的是10%的加速。通常，担心微优化是浪费时间；99%的时间，没有它的性能是好的或无论你是否这样做，性能都慢得令人无法接受。你知道吗

在这种情况下，你可能会从预先分组你的集群中得到更多的东西，减少O(n²)嵌套迭代的stored.clumps工作，至少可以在O(n log n)（使用sorted+itertools.groupby）或O(n)（使用多个dict，例如collections.defaultdict(list)）中做一些工作。例如，按分类分组的预处理运行可以是：

# Imports at top of file
from collections import defaultdict
from itertools import product

# Code using them
clumps_by_classification = defaultdict(list)

for clump in stored.clumps:
    clumps_by_classification[clump.classification].append(clump)

现在，您可以比较具有匹配分类的子组，而不是将每个束与每个其他束进行比较：

for classification, clumps in clumps_by_classification.items():
    for clump1, clump2 in product(clumps, repeat=2):
        if clump1.can_clump(clump2):
            # some code here

取决于束排序是否重要，以及束是否能够与其自身成束，您可以通过将product替换为另一个itertools函数（如combinations、combinations_with_replacement或permutations）来节省更多。你知道吗

是的，从理论上讲，product(repeat=2)步骤保持了功O(n²)，但现在它是O(n²)，当n是最大的子群时，具有相同的classification，而不是就整组簇而言。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章