同时选择最大值和最小值

count date location type 0 100 2018-01-01 site1 high 1 10 2018-01-01 site2 low 2 11 2018-01-01 site3 low 3 101 2018-01-03 site2 high 4 103 2018-01-03 site2 high 5 15 2018-01-03 site3 low

count date location month-day type 01-01 high 100 2018-01-01 site1 low 10 2018-01-01 site2 01-03 high 103 2018-01-03 site2 low 15 2018-01-03 site3

df = pd.DataFrame({'date':['2018-01-01', '2018-01-01', '2018-01-01', '2018-01-03', '2018-01-03', '2018-01-03'], 'location':['site1', 'site2', 'site3', 'site2', 'site2', 'site3'], 'type':['high', 'low', 'low', 'high', 'high', 'low'], 'count':[100, 10, 11, 101, 103, 15]}) df['date'] = pd.to_datetime(df['date']) df['month-day'] = df['date'].apply(lambda x: x.strftime('%m-%d')) maxCount = df.loc[df.groupby(['month-day']['type'=='high'])['count'].idxmax()] minCount = df.loc[df.groupby(['month-day']['type'=='low'])['count'].idxmin()] df = maxCount.merge(minCount, how='outer') df.set_index(['month-day', 'type'], inplace=True) df.sort_index(inplace=True)

3条回答

网友

1楼 · 编辑于 2024-10-02 22:35:27

你想做的是复杂的事实，你已经分配高点和低点。你需要解释这些吗(一天的最大值是否标记为low？）如果没有，你可以做一些简单的事情：

df.groupby(['month-day']).agg({ 'count': ['min', 'max'] })

这会给你这个：

          count     
            min  max
month-day           
01-01        10  100
01-03        15  103

网友

2楼 · 编辑于 2024-10-02 22:35:27

你不是很清楚逻辑：是否应该包括type？根据你的尝试，我将假设是：

# groupby
group = df.groupby('month-day')['count']

# create your min and max logic for boolean indexing
min_log = ((df['count'] == group.transform(min)) & (df['type'] == 'low'))
max_log = ((df['count'] == group.transform(max)) & (df['type'] == 'high'))

# boolean indexing to filter df
df[ min_log | max_log]

        date location  type  count month-day
0 2018-01-01    site1  high    100     01-01
1 2018-01-01    site2   low     10     01-01
4 2018-01-03    site2  high    103     01-03
5 2018-01-03    site3   low     15     01-03

网友

3楼 · 编辑于 2024-10-02 22:35:27

你可以试试agg、stack、loc和set_index

s = pd.to_datetime(df.date).dt.strftime('%m-%d')
m = df.groupby(s)['count'].agg(['idxmax', 'idxmin']).stack()
df_out = df.loc[m].set_index([m.index.droplevel(1), 'type'])

Out[127]:
                  date location  count
date  type
01-01 high  2018-01-01    site1    100
      low   2018-01-01    site2     10
01-03 high  2018-01-03    site2    103
      low   2018-01-03    site3     15

相关问题更多 >

编程相关推荐

热门问题

热门文章