<pre><code>import pandas as pd
import numpy as np
import datetime as date
import itertools
player_list = ['player' + str(x) for x in range(1,71)]
data = pd.DataFrame({'Names': player_list*1000,\
'Ob1' : np.random.rand(70000),\
'Ob2' : np.random.rand(70000) ,\
'Ob3' : np.random.rand(70000)})
data['Test'] = np.where(data['Ob2'] > 0.5, np.where(data['Ob3'] - data['Ob1'] < 0.1, 1 - (data['Ob3'] - data['Ob1']), 0), 0)
comboNames = list(itertools.combinations(data.Names.unique(), 2))
DataFrameDict = {elem : pd.DataFrame for elem in comboNames}
for key in DataFrameDict.keys():
DataFrameDict[key] = data[:][data.Names.isin(key)]
DataFrameDict[key] = DataFrameDict[key].sort_values(['Ob1'])
headers = ['Player1','Player2','Score','Count']
summary = pd.DataFrame(([tbl[0], tbl[1], DataFrameDict[tbl]['Test'].sum(),
DataFrameDict[tbl]['Test'].astype(bool).sum(axis=0)] for tbl in DataFrameDict),
columns=headers).sort_values(['Score'], ascending=[False])
</code></pre>
<p>我尽量保留你的代码。我把你的功能改成了np.哪里而不是apply,并在创建dict之前添加了test列,因为正如我在评论中所表达的那样,在那时执行apply没有任何意义。在</p>
<p>使用<code>%%timeit</code>时,每个循环得到26.2 s±1.15 s(平均值±标准偏差,7次运行,每个循环1次)</p>
<p><strong>编辑:</strong></p>
<p>这是我最快的速度:</p>
^{pr2}$
<p>我的目标是不使用循环或dicts来进一步提高速度。在</p>
<p>我的函数ScoreAndCount返回每个玩家的分数和计数。这个帕金森病获取函数的返回值并将其添加到初始df中。在</p>
<p>然后,我使用了itertools组合,并将其作为自己的数据帧,称为summary。然后,我将summary df的player1和player2列与原始df中的names列合并。在</p>
<p>下一步,我把玩家的分数和计数加起来,去掉不必要的列,然后进行排序。我最后每圈157ms。最慢的步骤是concat和merge,但是我想不出办法绕过它们,进一步提高速度。在</p>
<p><strong>编辑3</strong></p>
<p>我们将为两个测试设置一个种子并使用相同的数据df:</p>
<pre><code>np.random.seed(0)
player_list = ['player' + str(x) for x in range(1,71)]
data = pd.DataFrame({'Names': player_list*10,\
'Ob1' : np.random.rand(700),\
'Ob2' : np.random.rand(700) ,\
'Ob3' : np.random.rand(700)})
data.head()
Names Ob1 Ob2 Ob3
0 player1 0.548814 0.373216 0.313591
1 player2 0.715189 0.222864 0.365539
2 player3 0.602763 0.080532 0.201267
3 player4 0.544883 0.085311 0.487148
4 player5 0.423655 0.221396 0.990369
</code></pre>
<p>接下来我们将使用您的确切代码,并检查player1和player2之间的dict。在</p>
<pre><code>
def points(row):
val = 0
if row['Ob2'] > 0.5:
foo = row['Ob3'] - row['Ob1']
if foo < 0.1:
val = 1 - foo
else:
val = 0
return val
#create list of unique pairs
comboNames = list(itertools.combinations(data.Names.unique(), 2))
DataFrameDict = {elem : pd.DataFrame for elem in comboNames}
for key in DataFrameDict.keys():
DataFrameDict[key] = data[:][data.Names.isin(key)]
DataFrameDict[key] = DataFrameDict[key].sort_values(['Ob1'])
#Add test calculated column
for tbl in DataFrameDict:
DataFrameDict[tbl]['Test'] = DataFrameDict[tbl].apply(points, axis=1)
DataFrameDict[('player1', 'player2')].head()
Names Ob1 Ob2 Ob3 Test
351 player2 0.035362 0.013509 0.384273 0.0
630 player1 0.062636 0.305047 0.571550 0.0
561 player2 0.133461 0.758194 0.964210 0.0
211 player2 0.216897 0.056877 0.417333 0.0
631 player2 0.241902 0.557987 0.983555 0.0
</code></pre>
<p>接下来,我们将执行您在摘要中所做的操作,并获取测试列的总和,这将是player1和player2生成的分数</p>
<pre><code>DataFrameDict[('player1', 'player2')]['Test'].sum()
8.077455441105938
</code></pre>
<p>所以我们得到了8.0774。现在如果我说的是真的,如果我们在Edit2中编写代码,那么player1和player2之间的分数将是8.077。在</p>
<pre><code>data['test'] = np.where(data['Ob2'] > 0.5, np.where(data['Ob3'] - data['Ob1'] < 0.1, 1 - (data['Ob3'] - data['Ob1']), 0), 0)
def ScoreAndCount(row):
score = row.sum()
count = row.astype(bool).sum()
return score, count
df = data.groupby('Names')['test'].apply(ScoreAndCount).reset_index()
df = pd.concat([df['Names'], df.test.apply(pd.Series).rename(columns = {0: 'Score', 1:'Count'})], axis = 1)
summary = pd.DataFrame(itertools.combinations(data.Names.unique(), 2), columns = ['Player1', 'Player2'])
summary = summary.merge(df, left_on = 'Player1', right_on = 'Names')\
.merge(df, left_on = 'Player2', right_on = 'Names')\
.drop(columns = ['Names_x', 'Names_y'])
summary['Score'] = summary['Score_x'] + summary['Score_y']
summary['Count'] = summary['Count_x'] + summary['Count_y']
summary.drop(columns = ['Score_x','Count_x', 'Score_y','Count_y'], inplace = True)
summary = summary.sort_values('Score', ascending = False)
</code></pre>
<p>现在我们将使用player1和player2检查行</p>
<pre><code>summary[(summary['Player1'] == 'player1')&(summary['Player2'] == 'player2')]
Player1 Player2 Score Count
0 player1 player2 8.077455 6.0
</code></pre>
<p>如您所见,我通过edit2从player1player2计算出的分数与您在代码中所做的完全相同。在</p>