<p>这看起来有点难看,但它避免了循环和<code>apply</code>(本质上只是引擎盖下的一个循环)。我还没有在一个大的数据集上进行测试,但我怀疑它会比您当前的代码快得多。你知道吗</p>
<p>首先,创建一些额外的列,其中包含下一行/上一行的详细信息,因为这可能与您的某些条件有关:</p>
<pre><code>all_records['PrevHeartRate'] = all_records['Heart Rate'].shift()
all_records['NextHours'] = all_records['Hours'].shift(-1)
all_records['PrevICU'] = all_records['Icustay'].shift()
all_records['NextICU'] = all_records['Icustay'].shift(-1)
</code></pre>
<p>接下来,创建一个数据帧,其中包含每个id的第一条符合条件的记录(由于涉及大量的逻辑,这现在非常混乱):</p>
<pre><code>first_per_id = (all_records[((all_records['Heart Rate'] >= 90) |
((all_records['Heart Rate'].isnull()) &
(all_records['PrevHeartRate'] >= 90) &
(all_records['Icustay'] == all_records['PrevICU']))) &
((all_records['Hours'] >= 1) |
((all_records['NextHours'] >= 1) &
(all_records['NextICU'] == all_records['Icustay'])))]
.drop_duplicates(subset='Icustay', keep='first')[['Icustay']]
.reset_index()
.rename(columns={'index': 'first_index'}))
</code></pre>
<p>这给了我们:</p>
<pre><code> first_index Icustay
0 1 1001
1 4 2010
</code></pre>
<p>现在可以从原始数据帧中删除所有新列:</p>
<pre><code>all_records.drop(['PrevHeartRate', 'NextHours', 'PrevICU', 'NextICU'], axis=1, inplace=True)
</code></pre>
<p>然后我们可以将其与原始数据帧合并:</p>
<pre><code>new = pd.merge(all_records, first_per_id, how='left', on='Icustay')
</code></pre>
<p>给予:</p>
<pre><code> Heart Rate Hours Icustay Inclusion Criteria first_index
0 79.0 0.0 1001 0 1.0
1 91.0 1.5 1001 0 1.0
2 97.0 2.7 1001 0 1.0
3 NaN 3.4 1001 0 1.0
4 90.0 0.0 2010 0 4.0
5 94.0 29.4 2010 0 4.0
6 68.0 0.0 3005 0 NaN
</code></pre>
<p>从这里我们可以比较“第一个索引”(这是该id的第一个合格索引)和实际索引:</p>
<pre><code>new['Inclusion Criteria'] = new.index >= new['first_index']
</code></pre>
<p>这将提供:</p>
<pre><code> Heart Rate Hours Icustay Inclusion Criteria first_index
0 79.0 0.0 1001 False 1.0
1 91.0 1.5 1001 True 1.0
2 97.0 2.7 1001 True 1.0
3 NaN 3.4 1001 True 1.0
4 90.0 0.0 2010 True 4.0
5 94.0 29.4 2010 True 4.0
6 68.0 0.0 3005 False NaN
</code></pre>
<p>从这里开始,我们只需要整理一下(将结果列转换为整数,并删除第一个索引列):</p>
<pre><code>new.drop('first_index', axis=1, inplace=True)
new['Inclusion Criteria'] = new['Inclusion Criteria'].astype(int)
</code></pre>
<p>给出最终预期结果:</p>
<pre><code> Heart Rate Hours Icustay Inclusion Criteria
0 79.0 0.0 1001 0
1 91.0 1.5 1001 1
2 97.0 2.7 1001 1
3 NaN 3.4 1001 1
4 90.0 0.0 2010 1
5 94.0 29.4 2010 1
6 68.0 0.0 3005 0
</code></pre>