<p>我可以用numba对函数进行矢量化,结果代码用%%timeit在~8秒内运行。我听从了本·帕普的建议,事先计算了测试柱。我还预先对值进行了排序,并整理了DataFrameDict的创建。在</p>
<pre class="lang-py prettyprint-override"><code>
%%timeit
import pandas as pd
import numpy as np
import datetime as date
import itertools
import numba
@numba.vectorize
def points(a,b,c):
val = 0
if b > 0.5:
foo = c - a
if foo < 0.1:
val = 1 - foo
else:
val = 0
return val
player_list = ['player' + str(x) for x in range(1,71)]
data = pd.DataFrame({'Names': player_list*1000,\
'Ob1' : np.random.rand(70000),\
'Ob2' : np.random.rand(70000) ,\
'Ob3' : np.random.rand(70000)})
data['Test'] = points(data['Ob1'].values,data['Ob2'].values,data['Ob3'].values)
data = data.sort_values(['Ob1'])
comboNames = list(itertools.combinations(data.Names.unique(), 2))
DataFrameDict = {elem : data.loc[data.Names.isin(elem)] for elem in comboNames}
headers = ['Player1','Player2','Score','Count']
summary = pd.DataFrame(([tbl[0], tbl[1], DataFrameDict[tbl]['Test'].sum(),
DataFrameDict[tbl]['Test'].astype(bool).sum(axis=0)] for tbl in DataFrameDict),
columns=headers).sort_values(['Score'], ascending=[False])
</code></pre>
<p>每个回路8.52 s±204 ms(7次运行的平均值±标准偏差,每个回路1次)</p>