<p>下面是一个使用<code>pandas</code>来自<code>python</code>的选项。我们通过获取逻辑输出的累积和来创建一个分组变量(<code>dat.Indicator == "Y"</code>),然后通过删除“Indicator”为“Y”的行来子集行,按“StudentID”、“Group”分组,用<code>transform</code>获取“Value”的<code>max</code>,将其分配给“Value”,并<code>drop</code>不需要的列</p>
<pre><code>dat['Group'] = (dat.Indicator == "Y").cumsum()
datS = dat[dat.Indicator != "Y"]
datS1 = datS.copy()
datS1['Value'] = datS.groupby(['StudentID', 'Group'])['Value'].transform('max')
datS1.drop('Group', axis = 1, inplace = True)
datS1
</code></pre>
<p>-输出</p>
<p><a href="https://i.stack.imgur.com/iTx7C.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/iTx7C.png" alt="enter image description here"/></a></p>
<hr/>
<p>一个<code>base R</code>选项应该是<code>ave</code></p>
<pre><code>dat$Value <- with(dat, ave(Value, cumsum(Indicator == "Y"), FUN = max))
subset(dat, Indicator != "Y")
# StudentID Indicator Value
#1 100 N 35
#2 100 N 35
#3 100 N 35
#5 100 N 60
#6 100 N 60
#7 200 N 60
#8 200 N 60
#10 200 N 35
#11 200 N 35
</code></pre>
<h3>数据</h3>
<pre><code>import pandas as pd
dat = pd.DataFrame({'StudentID': [100, 100, 100, 100, 100, 100, 200, 200, 200, 200, 200],
'Indicator':[ "N", "N", "N", "Y", "N", "N", "N", "N", "Y", "N", "N"],
'Value':[30, 35, 28, 20, 29, 60, 40, 35, 20, 24, 35]})
#R
dat <-structure(list(StudentID = c(100L, 100L, 100L, 100L, 100L, 100L,
200L, 200L, 200L, 200L, 200L), Indicator = c("N", "N", "N", "Y",
"N", "N", "N", "N", "Y", "N", "N"), Value = c(35L, 35L, 35L,
60L, 60L, 60L, 60L, 60L, 35L, 35L, 35L)), .Names = c("StudentID",
"Indicator", "Value"), row.names = c(NA, -11L), class = "data.frame")
</code></pre>