用d的子集求列的最大值问题的回答

用d的子集求列的最大值

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<p>下面是一个使用<code>pandas</code>来自<code>python</code>的选项。我们通过获取逻辑输出的累积和来创建一个分组变量（<code>dat.Indicator == "Y"</code>），然后通过删除“Indicator”为“Y”的行来子集行，按“StudentID”、“Group”分组，用<code>transform</code>获取“Value”的<code>max</code>，将其分配给“Value”，并<code>drop</code>不需要的列</p> <pre><code>dat['Group'] = (dat.Indicator == "Y").cumsum() datS = dat[dat.Indicator != "Y"] datS1 = datS.copy() datS1['Value'] = datS.groupby(['StudentID', 'Group'])['Value'].transform('max') datS1.drop('Group', axis = 1, inplace = True) datS1 </code></pre> <p>-输出</p> <p><a href="https://i.stack.imgur.com/iTx7C.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/iTx7C.png" alt="enter image description here"/></a></p> <hr/> <p>一个<code>base R</code>选项应该是<code>ave</code></p> <pre><code>dat$Value <- with(dat, ave(Value, cumsum(Indicator == "Y"), FUN = max)) subset(dat, Indicator != "Y") # StudentID Indicator Value #1 100 N 35 #2 100 N 35 #3 100 N 35 #5 100 N 60 #6 100 N 60 #7 200 N 60 #8 200 N 60 #10 200 N 35 #11 200 N 35 </code></pre> <h3>数据</h3> <pre><code>import pandas as pd dat = pd.DataFrame({'StudentID': [100, 100, 100, 100, 100, 100, 200, 200, 200, 200, 200], 'Indicator':[ "N", "N", "N", "Y", "N", "N", "N", "N", "Y", "N", "N"], 'Value':[30, 35, 28, 20, 29, 60, 40, 35, 20, 24, 35]}) #R dat <-structure(list(StudentID = c(100L, 100L, 100L, 100L, 100L, 100L, 200L, 200L, 200L, 200L, 200L), Indicator = c("N", "N", "N", "Y", "N", "N", "N", "N", "Y", "N", "N"), Value = c(35L, 35L, 35L, 60L, 60L, 60L, 60L, 60L, 35L, 35L, 35L)), .Names = c("StudentID", "Indicator", "Value"), row.names = c(NA, -11L), class = "data.frame") </code></pre>

用d的子集求列的最大值

1 个回答

相关Python问题