<p>我想我们应该让这比现在容易些。我经常做的是排序,这样我就可以使用<code>transform</code>+<code>iloc</code>来选择合适的值,比如</p>
<pre><code>grouped = df.sort("Reference").groupby(["Genes", "Sub-Gene"])
df["TrueType"] = grouped["Type"].transform(lambda x: x.iloc[-1])
</code></pre>
<p>例如:</p>
<pre><code>In [211]: df
Out[211]:
Genes Sub-Gene Type Reference TrueType
0 1 SG1 type3 0 NothingYet
1 1 SG1 type1 1 NothingYet
2 1 SG2 type7 0 NothingYet
3 1 SG2 type3 0 NothingYet
4 1 SG2 type9 0 NothingYet
5 1 SG2 type9 1 NothingYet
6 2 SG1 type3 1 NothingYet
7 2 SG1 type7 0 NothingYet
[8 rows x 5 columns]
In [212]: df.sort("Reference").groupby(["Genes", "Sub-Gene"])["Type"].transform(lambda x: x.iloc[-1])
Out[212]:
0 type1
2 type9
3 type9
4 type9
7 type3
1 type1
5 type9
6 type3
Name: Type, dtype: object
</code></pre>
<p>生产</p>
<pre><code>In [213]: df["TrueType"] = df.sort("Reference").groupby(["Genes", "Sub-Gene"])["Type"].transform(lambda x: x.iloc[-1])
In [214]: df
Out[214]:
Genes Sub-Gene Type Reference TrueType
0 1 SG1 type3 0 type1
1 1 SG1 type1 1 type1
2 1 SG2 type7 0 type9
3 1 SG2 type3 0 type9
4 1 SG2 type9 0 type9
5 1 SG2 type9 1 type9
6 2 SG1 type3 1 type3
7 2 SG1 type7 0 type3
[8 rows x 5 columns]
</code></pre>