<p><a href="http://blog.explainmydata.com/2012/07/expensive-lessons-in-python-performance.html" rel="nofollow">http://blog.explainmydata.com/2012/07/expensive-lessons-in-python-performance.html</a></p>
<p>“韦斯·麦金尼是个天才。如果你正在实现Wes McKinney已经放在他的库中的任何东西,就停下来。他的代码比你要写的任何东西都更快,更健壮,更可能是正确的。想要滚动窗口聚合器吗?使用熊猫。需要处理丢失的数据吗?使用熊猫。你是不是在写一个令人难以置信的丑陋的黑客,试图实现连接和分组比的过NumPy数组,但实际上花了3个小时计算一个微妙的不正确的结果?(我已经这样做了)。天哪,停下来用熊猫。”</p>
<p>我同意这一点。就用熊猫吧。这是毫无意义的重做已经做了以前和更多的人和优化的性能。你知道吗</p>
<p>解决方案:</p>
<pre><code>import pandas as pd
s="""bogie-n bypass-n 0.00304367004111
...: flask-n bypass-n 0.00298246799918
...: faggot-n sprayer-n 0.00507314183347
...: bypass-n sprayer-n 0.00136494481917
...: sprayer-n sprayer-n 1.0"""
lines = [x.split() for x in s.split('\n')]
lines
Out[152]:
[['bogie-n', 'bypass-n', '0.00304367004111'],
['flask-n', 'bypass-n', '0.00298246799918'],
['faggot-n', 'sprayer-n', '0.00507314183347'],
['bypass-n', 'sprayer-n', '0.00136494481917'],
['sprayer-n', 'sprayer-n', '1.0']]
df = pd.DataFrame(lines)
df
Out[154]:
0 1 2
0 bogie-n bypass-n 0.00304367004111
1 flask-n bypass-n 0.00298246799918
2 faggot-n sprayer-n 0.00507314183347
3 bypass-n sprayer-n 0.00136494481917
4 sprayer-n sprayer-n 1.0
df[2] = df[2].astype(float)
df
Out[163]:
0 1 2
0 bogie-n bypass-n 0.003044
1 flask-n bypass-n 0.002982
2 faggot-n sprayer-n 0.005073
3 bypass-n sprayer-n 0.001365
4 sprayer-n sprayer-n 1.000000
df[df[2] != 1.0]
Out[164]:
0 1 2
0 bogie-n bypass-n 0.003044
1 flask-n bypass-n 0.002982
2 faggot-n sprayer-n 0.005073
3 bypass-n sprayer-n 0.001365
</code></pre>