计算生成器/迭代器中项目数的最短方法是什么？问题的回答

计算生成器/迭代器中项目数的最短方法是什么？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

方法，该方法在iterable可能很长时比<code>sum(1 for i in it)</code>快（在iterable很短时不比<code>len(list(it))</code>慢），同时保持固定的内存开销行为（与<code>len(list(it))</code>不同），以避免为较大的输入交换抖动和重新分配开销： <pre><code># On Python 2 only, get zip that lazily generates results instead of returning list from future_builtins import zip from collections import deque from itertools import count def ilen(it): # Make a stateful counting iterator cnt = count() # zip it with the input iterator, then drain until input exhausted at C level deque(zip(it, cnt), 0) # cnt must be second zip arg to avoid advancing too far # Since count 0 based, the next value is the count return next(cnt) </code></pre> 与<code>len(list(it))</code>一样，它在CPython上执行C代码循环（<code>deque</code>、<code>count</code>和<code>zip</code>都在C中实现）；避免每个循环执行字节代码通常是CPython性能的关键。 很难找到公平的测试用例来比较性能（<code>list</code>欺骗使用的<code>__length_hint__</code>不太可能适用于任意输入的iterable，<code>itertools</code>不提供<code>__length_hint__</code>的函数通常有特殊的操作模式，当每个循环返回的值在释放之前释放时工作得更快下一个值被请求，使用<code>maxlen=0</code>的<code>deque</code>将执行此操作）。我使用的测试用例是创建一个生成器函数，该函数接受输入并返回一个缺少特殊的<code>itertools</code>返回容器优化或<code>__length_hint__</code>的C级生成器，使用Python 3.3的<code>yield from</code>： <pre><code>def no_opt_iter(it): yield from it </code></pre> 然后使用<code>ipython</code><code>%timeit</code>魔术（用不同的常数替换100）： <pre><code>>>> %%timeit -r5 fakeinput = (0,) * 100 ... ilen(no_opt_iter(fakeinput)) </code></pre> 当输入不够大以至于<code>len(list(it))</code>会导致内存问题时，在运行Python 3.5x64的Linux机器上，无论输入长度如何，我的解决方案比<code>def ilen(it): return len(list(it))</code>长大约50%。 对于最小的输入，调用<code>deque</code>/<code>zip</code>/<code>count</code>/<code>next</code>的设置成本意味着这样做比<code>def ilen(it): sum(1 for x in it)</code>花费的时间要长得多（在我的机器上，长度为0的输入要多200纳秒，比简单的<code>sum</code>方法多33%），但对于更长的输入，每个额外元素的运行时间大约是前者的一半；对于长度为5的输入，成本是相等的，并且在长度为50-100的范围内，与实际工作相比，初始开销是不明显的；<code>sum</code>方法大约需要两倍的时间。 基本上，如果内存使用问题或输入没有限制大小，并且您关心的是速度而不是简洁性，那么使用这个解决方案。如果输入是有界的并且很小，<code>len(list(it))</code>可能是最好的，如果它们是无界的，但是简单性/简洁性很重要，那么可以使用<code>sum(1 for x in it)</code>。

计算生成器/迭代器中项目数的最短方法是什么？

1 个回答

相关Python问题