<p>速度差实际上大于3倍,但是首先要创建一个包含100万个整数的巨大内存列表,从而降低两个版本的速度。把这些从时间试验中分离出来:</p>
<pre><code>>>> import timeit
>>> def sum1(lst):
... s = 0
... for i in lst:
... s += i
... return s
...
>>> def sum2(lst):
... return sum(lst)
...
>>> values = range(1000000)
>>> timeit.timeit('f(lst)', 'from __main__ import sum1 as f, values as lst', number=100)
3.457869052886963
>>> timeit.timeit('f(lst)', 'from __main__ import sum2 as f, values as lst', number=100)
0.6696369647979736
</code></pre>
<p>现在的速度差已经上升到5倍多了。</p>
<p>一个<code>for</code>循环作为解释的Python字节码执行。<code>sum()</code>完全在C代码中循环。解释字节码和C码之间的速度差很大。</p>
<p>此外,如果C代码可以将sum保留为C类型,那么它将确保不会创建新的Python对象;这适用于<code>int</code>和<code>float</code>结果。</p>
<p>反汇编的Python版本执行以下操作:</p>
<pre><code>>>> import dis
>>> def sum1():
... s = 0
... for i in range(1000000):
... s += i
... return s
...
>>> dis.dis(sum1)
2 0 LOAD_CONST 1 (0)
3 STORE_FAST 0 (s)
3 6 SETUP_LOOP 30 (to 39)
9 LOAD_GLOBAL 0 (range)
12 LOAD_CONST 2 (1000000)
15 CALL_FUNCTION 1
18 GET_ITER
>> 19 FOR_ITER 16 (to 38)
22 STORE_FAST 1 (i)
4 25 LOAD_FAST 0 (s)
28 LOAD_FAST 1 (i)
31 INPLACE_ADD
32 STORE_FAST 0 (s)
35 JUMP_ABSOLUTE 19
>> 38 POP_BLOCK
5 >> 39 LOAD_FAST 0 (s)
42 RETURN_VALUE
</code></pre>
<p>除了解释器循环比C慢之外,<code>INPLACE_ADD</code>将创建一个新的整数对象(超过255,CPython将小的<code>int</code>对象缓存为singleton)。</p>
<p>您可以在Python mercurial代码存储库中看到<a href="http://hg.python.org/cpython/file/f2e6c33ce3e9/Python/bltinmodule.c#l2327" rel="noreferrer">C implementation</a>,但它在注释中明确声明:</p>
<pre class="lang-c prettyprint-override"><code>/* Fast addition by keeping temporary sums in C instead of new Python objects.
Assumes all inputs are the same type. If the assumption fails, default
to the more general routine.
*/
</code></pre>