擅长:python、mysql、java
<p>我将这样做:</p>
<pre><code># Helper function
def calc_overlap(x):
if min_of_max == max_of_min:
return 0
low = max(min_of_max, x.Min)
high = min(max_of_min, x.Max)
return high-low
dup_global_name = df.Global_name.value_counts()[df.Global_name.value_counts() > 1].index
dup_global_name = list(dup_global_name)
# Filter duplicates
df_dup = df[df.Global_name.isin(dup_global_name)]
# Add overlap column
df_dup['overlap'] = df_dup.apply(lambda x: calc_overlap(x), axis=1)
#Select max overlap
df_dup = df_dup.loc[df_dup.groupby('Global_name').overlap.idxmax()]
# Drop overlap col
df_dup.drop('overlap', axis=1, inplace=True)
#Concatinate with nonduplicate ones
pd.concat([df[~df.Global_name.isin(dup_global_name)], df_dup])
</code></pre>
<p>所需输出:
<a href="https://i.stack.imgur.com/dU0in.jpg" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/dU0in.jpg" alt="enter image description here"/></a></p>