<p>希望我得到了正确的条件;下面的计算依赖于赋值前的索引;因此,我创建了一个长数据帧,创建了长数据帧的条件,将大小减少到唯一索引,并将结果分配给原始数据帧:</p>
<p>创建一个长数据框来计算条件(为了方便起见,我在这里使用<a href="https://pyjanitor-devs.github.io/pyjanitor/" rel="nofollow noreferrer">pyjanitor</a><a href="https://pyjanitor-devs.github.io/pyjanitor/reference/janitor.functions/janitor.pivot_longer.html#janitor.pivot_longer" rel="nofollow noreferrer">pivot_longer</a>对其进行重塑;您可以在不使用PyGatitor的情况下通过几个步骤来完成这项工作):</p>
<pre class="lang-py prettyprint-override"><code># pip install git+https://github.com/pyjanitor-devs/pyjanitor.git
import janitor as jn
import pandas as pd
res = (df.pivot_longer(['ID', 'Hour'],
names_pattern= [r'.+\d$', r'.+Score$'],
names_to = ['cus', 'score'],
ignore_index = False)
)
print(res)
ID Hour cus score
0 1 11 8 0.62
1 2 13 15 0.57
2 3 16 9 0.67
3 4 8 11 0.58
0 1 11 10 0.59
1 2 13 16 0.57
2 3 16 14 0.54
3 4 8 8 0.55
0 1 11 11 0.59
1 2 13 18 0.57
2 3 16 16 0.48
3 4 8 12 0.43
0 1 11 14 0.54
1 2 13 13 0.57
2 3 16 12 0.34
3 4 8 17 0.25
</code></pre>
<p>为<code>match1</code>创建条件:</p>
<pre class="lang-py prettyprint-override"><code># get booleans for max and no duplicates
res = res.sort_index()
max1 = res.groupby(level=0).score.transform('max')
# max without duplicates
cond1 = res.score.eq(max1).groupby(level=0).sum().eq(1)
cond1 = cond1 & df.Hour.eq(df.Cus1)
# max with duplicates
cond2 = res.score.eq(max1).groupby(level=0).transform('sum').gt(1)
cond2 &= res.Hour.eq(res.cus)
cond2 = cond2.groupby(level=0).any()
df['match1'] = np.where(cond1 | cond2, 'YES', 'NO')
</code></pre>
<p>匹配2:</p>
<pre class="lang-py prettyprint-override"><code>
second_largest = (res.sort_values('score', ascending=False)
.groupby(level=0, sort = False)
.score
.transform('nth', 1)
)
second_largest = second_largest.sort_index()
# second largest without duplicates
cond_1 = res.score.eq(second_largest).groupby(level=0).sum().eq(1)
cond_1 &= df.Hour.eq(df.Cus2)
# second largest with duplicates
cond_2 = res.score.eq(second_largest).groupby(level=0).transform('sum').gt(1)
cond_2 &= res.Hour.eq(res.cus)
cond_2 = cond_2.groupby(level=0).any()
df['match2'] = np.where(cond_1 | cond_2, 'YES', 'NO')
</code></pre>
<pre class="lang-py prettyprint-override"><code>df
ID Hour Cus1 Cus2 Cus3 Cus4 Cus1_Score Cus2_Score Cus3_Score Cus4_Score match1 match2
0 1 11 8 10 11 14 0.62 0.59 0.59 0.54 NO YES
1 2 13 15 16 18 13 0.57 0.57 0.57 0.57 YES YES
2 3 16 9 14 16 12 0.67 0.54 0.48 0.34 NO NO
3 4 8 11 8 12 17 0.58 0.55 0.43 0.25 NO YES
</code></pre>