<pre><code>df = pd.DataFrame({'id': ['A', 'B', 'B', 'C', 'D', 'D' ,'D', 'D', 'D', 'D', 'D'],
'hit': ['hit1', 'hit2', 'hit3','hit4', 'hit5','hit6', 'hit7','hit8', 'hit9','hit10', 'hit11'],
'from': [56,89,240,332,291,287,381,287,373,514, 599],
'to':[102,275,349,480,512,313,426,316,422,600, 602],
'value': [0.00085,0.00034,0.00034,3.40E-15,3.80E-24,0.00098,0.00098,0.0029,0.0029,0.0021, 0.002]})
overlapMask = df.sort_values(by = 'from')\
.groupby('id')\
.apply(lambda x: np.where(x['from'] < x['to'].shift(), 0 , 1).cumsum())\
.reset_index()
df['Mask'] = np.concatenate((overlapMask[0].values))
df.drop_duplicates(subset = ['id','value'], keep = False, inplace = True)
df.sort_values(by = 'value')\
.groupby(['id', 'Mask'])\
.head(1)\
.reset_index()\
.drop(['Mask', 'index'],axis = 1)\
.sort_values(by = 'id')
id hit from to value
2 A hit1 56 102 8.500000e-04
1 C hit4 332 480 3.400000e-15
0 D hit5 291 512 3.800000e-24
3 D hit11 599 602 2.000000e-03
</code></pre>
<p>所以我的解决方案是用一个掩模来检查重叠。通过对“from”值进行排序,并检查下一个“from”值是否小于上一个“to”值。这个np.inf公司只需确保分组中的第一个值始终为0。在</p>
<p>然后我们在df中把这个面具列为自己的列。然后我们根据需要进行分组,删除所有重复项,重置索引,最后删除掩码。在</p>