<p>欢迎来到SO!你知道吗</p>
<p>以下是使用R的几种可能性之一:</p>
<pre><code>df <- data.frame(
hadm_id = c(100001, 100003, 100003, 100006, 100006, 100007, 100007,
100009, 100009, 100010, 100010, 100011, 100011),
rass_v = c(0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
)
# Edit: for better readability please use @Moody_Mudskipper's answer:
# df <- setNames(aggregate(df$rass_v, by = list(df$hadm_id), max), names(df))
df <- aggregate(rass_v~hadm_id, df, max)
print(df)
</code></pre>
<p>详见<a href="https://stackoverflow.com/questions/25314336/extract-the-maximum-value-within-each-group-in-a-dataframe">this</a>。你知道吗</p>
<p>这是一个更快的数据表解决方案(对于较大的表):</p>
<pre><code>library(data.table)
DT <- data.table(
hadm_id = c(100001, 100003, 100003, 100006, 100006, 100007, 100007,
100009, 100009, 100010, 100010, 100011, 100011),
rass_v = c(0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1)
)
DT <- DT[DT[, .I[which.max(rass_v)], by=hadm_id]$V1]
print(DT)
</code></pre>
<p>请看这个相关的<a href="https://stackoverflow.com/questions/24558328/how-to-select-the-row-with-the-maximum-value-in-each-group">question</a>和Arun的答案。你知道吗</p>
<p>结果:</p>
<pre><code> hadm_id rass_v
1: 100001 0
2: 100003 1
3: 100006 1
4: 100007 1
5: 100009 1
6: 100010 1
7: 100011 1
</code></pre>
<hr/>
<p>编辑:以下是等效方法:</p>
<pre><code>import pandas as pd
df = pd.DataFrame({'hadmid': [100001, 100003, 100003, 100006, 100006, 100007, 100007,
100009, 100009, 100010, 100010, 100011, 100011],
'rass_v': [0, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1, 0, 1]})
df = df.groupby(['hadmid'], sort=False)['rass_v'].max()
print(df)
</code></pre>