<p>下面是对几种备选方案的<code>timeit</code>比较。在</p>
<pre><code>| method | ms per loop |
|--------------------+-------------|
| alt2 | 2.36 |
| using_concat | 3.26 |
| using_double_merge | 22.4 |
| orig | 22.6 |
| alt | 45.8 |
</code></pre>
<p>使用<code>timeit</code>生成<code>timeit</code>结果:</p>
^{pr2}$
<hr/>
<pre><code>import numpy as np
import pandas as pd
def alt(df):
df['const'] = 1
result = pd.merge(df, df, on='const', how='outer')
result = result.loc[(result['colour_x'] != result['colour_y'])]
result['color'] = result['colour_x'] + '_' + result['colour_y']
result['points'] = result['points_x'] - result['points_y']
result = result[['color', 'points']]
return result
def alt2(df):
points = np.add.outer(df['points'], -df['points'])
color = pd.MultiIndex.from_product([df['colour'], df['colour']])
mask = color.labels[0] != color.labels[1]
color = color.map('_'.join)
result = pd.DataFrame({'points':points.ravel(), 'color':color})
result = result.loc[mask]
return result
def orig(df):
combos = []
points = []
for i1 in range(len(df)):
for i2 in range(len(df)):
colour_main = df['colour'].iloc[i1]
colour_secondary = df['colour'].iloc[i2]
if colour_main != colour_secondary:
combo = colour_main + "_" + colour_secondary
point1 = df['points'].values[i1]
point2 = df['points'].values[i2]
new_points = point1 - point2
combos.append(combo)
points.append(new_points)
return pd.DataFrame({'color':combos, 'points':points})
def using_concat(df):
"""https://stackoverflow.com/a/51641085/190597 (RafaelC)"""
d = df.set_index('colour').to_dict()['points']
s = pd.Series(list(itertools.combinations(df.colour, 2)))
s = pd.concat([s, s.transform(lambda k: k[::-1])])
v = s.map(lambda k: d[k[0]] - d[k[1]])
df2 = pd.DataFrame({'comb': s.str.get(0)+'_' + s.str.get(1), 'values': v})
return df2
def using_double_merge(df):
"""https://stackoverflow.com/a/51641007/190597 (sacul)"""
new = (df.reindex(pd.MultiIndex.from_product([df.colour, df.colour]))
.reset_index()
.drop(['colour', 'points'], 1)
.merge(df.set_index('colour'), left_on='level_0', right_index=True)
.merge(df.set_index('colour'), left_on='level_1', right_index=True))
new['points_y'] *= -1
new['sum'] = new.sum(axis=1)
new = new[new.level_0 != new.level_1].drop(['points_x', 'points_y'], 1)
new['colours'] = new[['level_0', 'level_1']].apply(lambda x: '_'.join(x),1)
return new[['colours', 'sum']]
def make_df(N):
df = pd.DataFrame({'colour': np.arange(N),
'points': np.random.randint(10, size=N)})
df['colour'] = df['colour'].astype(str)
return df
</code></pre>
<hr/>
<p><code>alt2</code>中的主要思想是使用<code>np.add_outer</code>构造一个加法表
超出<code>df['points']</code>:</p>
<pre><code>In [149]: points = np.add.outer(df['points'], -df['points'])
In [151]: points
Out[151]:
array([[ 0, -9, 4],
[ 9, 0, 13],
[ -4, -13, 0]])
</code></pre>
<p><code>ravel</code>用于使数组一维:</p>
<pre><code>In [152]: points.ravel()
Out[152]: array([ 0, -9, 4, 9, 0, 13, -4, -13, 0])
</code></pre>
<p>使用<code>pd.MultiIndex.from_product</code>生成颜色组合:</p>
<pre><code>In [153]: color = pd.MultiIndex.from_product([df['colour'], df['colour']])
In [155]: color = color.map('_'.join)
In [156]: color
Out[156]:
Index(['red_red', 'red_yellow', 'red_black', 'yellow_red', 'yellow_yellow',
'yellow_black', 'black_red', 'black_yellow', 'black_black'],
dtype='object')
</code></pre>
<p>生成一个掩码以删除重复项:</p>
<pre><code>mask = color.labels[0] != color.labels[1]
</code></pre>
<p>然后从这些部分生成<code>result</code>:</p>
<pre><code> result = pd.DataFrame({'points':points.ravel(), 'color':color})
result = result.loc[mask]
</code></pre>
<hr/>
<p>在我的<a href="https://stackoverflow.com/revisions/8760b788-6370-46ab-bfca-fc6c23ecd15d/view-source">original answer, here</a>中解释了<code>alt</code>背后的思想。在</p>