<p>这是一个非常有趣的问题!在</p>
<p>我认为这取决于以下几个方面:</p>
<p>按索引访问单行(<strong>索引已排序且唯一</strong>)应具有运行时<code>O(m)</code>,其中<code>m << n_rows</code></p>
<p>按索引访问单行(<strong>索引不是唯一的,并且未排序</strong>)应具有运行时<code>O(n_rows)</code></p>
<p>按索引访问单行(<strong>索引不是唯一的,并且是排序的</strong>)应该有运行时<code>O(m)</code>,其中<code>m < n_rows</code>)</p>
<p>通过布尔索引访问行(独立于索引)应具有运行时<code>O(n_rows)</code></p>
<hr/>
<p>演示:</p>
<p><strong>索引已排序且唯一:</strong></p>
<pre><code>In [49]: df = pd.DataFrame(np.random.rand(10**5,6), columns=list('abcdef'))
In [50]: %timeit df.loc[random.randint(0, 10**4)]
The slowest run took 27.65 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 331 µs per loop
In [51]: %timeit df.iloc[random.randint(0, 10**4)]
1000 loops, best of 3: 275 µs per loop
In [52]: %timeit df.query("a > 0.9")
100 loops, best of 3: 7.84 ms per loop
In [53]: %timeit df.loc[df.a > 0.9]
100 loops, best of 3: 2.96 ms per loop
</code></pre>
<p><strong>索引未排序且不唯一:</strong></p>
^{pr2}$
<p><strong>索引不是唯一的,并且已排序:</strong></p>
<pre><code>In [64]: df = pd.DataFrame(np.random.rand(10**5,6), columns=list('abcdef'), index=np.random.randint(0, 10000, 10**5)).sort_index()
In [65]: df.index.is_monotonic_increasing
Out[65]: True
In [66]: %timeit df.loc[random.randint(0, 10**4)]
The slowest run took 9.70 times longer than the fastest. This could mean that an intermediate result is being cached.
1000 loops, best of 3: 478 µs per loop
In [67]: %timeit df.iloc[random.randint(0, 10**4)]
1000 loops, best of 3: 262 µs per loop
In [68]: %timeit df.query("a > 0.9")
100 loops, best of 3: 7.81 ms per loop
In [69]: %timeit df.loc[df.a > 0.9]
100 loops, best of 3: 2.95 ms per loop
</code></pre>