Pandas：选择一周中最高的一天，不包括周末，除非有一个记录问题的回答

Pandas：选择一周中最高的一天，不包括周末，除非有一个记录

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我编写了一个函数来选择本周的有效最高记录，这需要在每周groupby上使用： <pre><code>def last_valid_report(recs): if len(recs) == 1: return recs recs = recs.copy() # recs = recs[recs['dates'].dt.weekday <= 4].nlargest(1, recs['dates'].dt.weekday) # doesn't work recs['weekday'] = recs['dates'].dt.weekday # because nlargest() needs a column name recs = recs[recs['weekday'] <= 4].nlargest(1, 'weekday') del recs['weekday'] return recs # could have also done: # return recs[recs['weekday'] <= 4].nlargest(1, 'weekday').drop('weekday', axis=1) </code></pre> 用正确的小组打电话，我得到： <pre><code>In [155]: df2 = df.groupby(df['dates'].dt.week).apply(last_valid_report) In [156]: df2 Out[156]: dates nums dates 45 4 2018-11-09 63 46 8 2018-11-15 90 47 10 2018-11-19 80 48 11 2018-12-01 94 </code></pre> <hr/> 有几个问题： <ol> <li>如果我不放<code>recs.copy()</code>，我得到<code>ValueError: Shape of passed values is (3, 12), indices imply (3, 4)</code></li> <li><a href="https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.nlargest.html" rel="nofollow noreferrer">pandas' ^{<cd3>}</a>只使用列名，不使用表达式 <ul> <li>所以我需要在函数中创建一个额外的列，并在返回它之前删除它<我也可以在原始的df中创建它，并将它放在<code>.apply()</code>之后</li> </ul></li> <li>我从<a href="https://stackoverflow.com/a/12411852/1431750">groupby+apply</a>得到一个额外的索引列'dates'，，需要是<a href="https://stackoverflow.com/a/42124685/1431750">explicitly dropped</a>： <pre><code>In [157]: df2.index = df2.index.droplevel(); df2 Out[157]: dates nums 4 2018-11-09 63 8 2018-11-15 90 10 2018-11-19 80 11 2018-12-01 94 </code></pre></li> <li>如果我得到一个包含星期六和星期天数据（2天）的记录，我需要添加一个检查<code>recs[recs['weekday'] <= 4]</code>是否为空，然后只使用<code>.nlargest(1, 'weekday')</code>而不过滤<code>weekday <= 4</code>；但这不是问题的重点</li> </ol>

Pandas：选择一周中最高的一天，不包括周末，除非有一个记录

1 个回答

相关Python问题