擅长:python、mysql、java
<p>您应该按<code>bank</code>列对值进行排序,并使用<code>na_position='last'</code>(因此<code>.drop_duplicates(..., keep='first')</code>将保留一个非na的值)。在</p>
<p>试试这个:</p>
<pre class="lang-py prettyprint-override"><code>import pandas as pd
import numpy as np
df = pd.DataFrame({'firstname': ['foo Bar', 'Bar Bar', 'Foo Bar'],
'lastname': ['Foo Bar', 'Bar', 'Foo Bar'],
'email': ['Foo bar', 'Bar', 'Foo Bar'],
'bank': [np.nan, 'abc', 'xyz']})
uniq_indx = (df.sort_values(by="bank", na_position='last').dropna(subset=['firstname', 'lastname', 'email'])
.applymap(lambda s: s.lower() if type(s) == str else s)
.applymap(lambda x: x.replace(" ", "") if type(x) == str else x)
.drop_duplicates(subset=['firstname', 'lastname', 'email'], keep='first')).index
# save unique records
dfiban_uniq = df.loc[uniq_indx]
print(dfiban_uniq)
</code></pre>
<p>输出:</p>
^{pr2}$
<p>(这只是您的原始代码,在<code>uniq_indx = ...</code>的开头加上<code>.sort_values(by="bank", na_position='last')</code>)</p>