擅长:python、mysql、java
<p>除非需要使用groupby(对于大数据帧来说,groupby的速度很慢),否则可以执行以下操作:</p>
<pre class="lang-py prettyprint-override"><code>def custom_drop_duplicates(dataframe):
localDF = dataframe.copy()
criteria_list = []
for i, col in enumerate(['c', 'd', 'f']):
localDF.loc[:, 'criteria{}'.format(i)] = [len(x) for x in localDF[col]]
criteria_list.append('criteria{}'.format(i))
localDF.loc[:, 'criteria{}'.format(i+1)] = [all(x not in y for x in ['m', 'n']) or any(x in y for x in ['w', 'y']) for y in localDF['f']]
criteria_list.append('criteria{}'.format(i+1))
# here you have a judgement call: if criteria are in conflict, you need to order them. I just assume they are ordered in the way you described them.
localDF.sort_values(by=criteria_list, ascending=True, inplace=True)
localDF.drop_duplicates(subset=['a', 'b'], keep='last', inplace=True)
localDF.drop(columns=criteria_list, inplace=True)
return localDF
</code></pre>
<p>希望这有帮助</p>