<p>下面是一种使用<a href="https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.IntervalIndex.html" rel="nofollow noreferrer">^{<cd1>}</a>的方法:</p>
<pre><code>m=pd.DataFrame(dict_list)
s = pd.IntervalIndex.from_arrays(df.start,df.end, 'both')
#output-> IntervalIndex([[1, 100], [101, 200], [201, 300]],
#closed='both',
#dtype='interval[int64]')
n=m.set_index(s).loc[m['page_number']].groupby(level=0)['page_number'].count()
n.index=pd.MultiIndex.from_arrays([n.index])
</code></pre>
<hr/>
<pre><code>final=df.set_index(['start','end']).assign(new_note_count=n).reset_index()
final['new_note_count']=final['new_note_count'].fillna(0)
</code></pre>
<hr/>
<p>输出:</p>
<pre><code> start end note_count new_note_count
0 1 100 0 2.0
1 101 200 0 1.0
2 201 300 0 0.0
</code></pre>
<p>详情:
一旦我们把索引设为区间,就把<code>m</code>和<code>.loc[]</code>的索引设为<code>page_number</code></p>
<pre><code>print(m.set_index(s).loc[m['page_number']])
</code></pre>
<hr/>
<pre><code> type id page_number location_number content
[1, 100] highlight 0 4 40 Foo
[1, 100] highlight 0 4 40 Foo
[101, 200] highlight 1 12 96 Bar
</code></pre>
<p>然后使用<code>groupby()</code>get counts,转换为Multiindex并将其赋值。你知道吗</p>