查找满足python条件的特定值

import pandas as pd import numpy as np df = pd.DataFrame({'date': ['2019-08-06 09:00:00', '2019-08-06 12:00:00', '2019-08-06 18:00:00', '2019-08-06 21:00:00', '2019-08-07 09:00:00', '2019-08-07 16:00:00', '2019-08-08 17:00:00' ,'2019-08-09 16:00:00'], 'type': [0, 1, np.nan, 1, np.nan, np.nan, 0 ,0], 'colour': ['blue', 'red', np.nan, 'blue', np.nan, np.nan, 'blue', 'red'], 'maxPixel': [255, 7346, 32, 5184, 600, 322, 72, 6000], 'minPixel': [86, 96, 14, 3540, 528, 300, 12, 4009], 'colourDate': ['2019-08-06 12:00:00', '2019-08-08 16:00:00', '2019-08-06 23:00:00', '2019-08-06 22:00:00', '2019-08-08 09:00:00', '2019-08-09 16:00:00', '2019-08-08 23:00:00' ,'2019-08-11 16:00:00'] }) max_conditions = [(df['type'] == 1) & (df['colour'] == 'blue'), (df['type'] == 1) & (df['colour'] == 'red')] max_choices = [np.where(df['date'] <= df['colourDate'], max(df['maxPixel']), np.nan), np.where(df['date'] <= df['colourDate'], min(df['minPixel']), np.nan)] df['pixelLimit'] = np.select(max_conditions, max_choices, default=np.nan)

date type colour maxPixel minPixel colourDate pixelLimit 0 2019-08-06 09:00:00 0.0 blue 255 86 2019-08-06 12:00:00 NaN 1 2019-08-06 12:00:00 1.0 red 7346 96 2019-08-08 16:00:00 12.0 2 2019-08-06 18:00:00 NaN NaN 32 14 2019-08-06 23:00:00 NaN 3 2019-08-06 21:00:00 1.0 blue 5184 3540 2019-08-06 22:00:00 6000.0 4 2019-08-07 09:00:00 NaN NaN 600 528 2019-08-08 09:00:00 NaN 5 2019-08-07 16:00:00 NaN NaN 322 300 2019-08-09 16:00:00 NaN 6 2019-08-08 17:00:00 0.0 blue 72 12 2019-08-08 23:00:00 NaN 7 2019-08-09 16:00:00 0.0 red 6000 4009 2019-08-11 16:00:00 NaN

date type colour maxPixel minPixel colourDate pixelLimit 0 2019-08-06 09:00:00 0.0 blue 255 86 2019-08-06 12:00:00 NaN 1 2019-08-06 12:00:00 1.0 red 7346 96 2019-08-08 16:00:00 14.0 2 2019-08-06 18:00:00 NaN NaN 32 14 2019-08-06 23:00:00 NaN 3 2019-08-06 21:00:00 1.0 blue 5184 3540 2019-08-06 22:00:00 5184.0 4 2019-08-07 09:00:00 NaN NaN 600 528 2019-08-08 09:00:00 NaN 5 2019-08-07 16:00:00 NaN NaN 322 300 2019-08-09 16:00:00 NaN 6 2019-08-08 17:00:00 0.0 blue 72 12 2019-08-08 23:00:00 NaN 7 2019-08-09 16:00:00 0.0 red 6000 4009 2019-08-11 16:00:00 NaN

1条回答

网友

1楼 · 发布于 2024-06-28 16:20:25

对于像这样的匹配问题，一种可能是进行完全合并，然后使用布尔级数对满足条件的所有行（该行）进行子集，并在所有可能的匹配中找到max或min。因为这需要稍微不同的列和不同的函数，所以我将操作分为两段非常相似的代码，一段处理1/blue，另一段处理1/red

首先做些家务，让事情准时进行

import pandas as pd

df['date'] = pd.to_datetime(df['date'])
df['colourDate'] = pd.to_datetime(df['colourDate'])

计算每行时间之间1/red的最小像素

# Subset of rows we need to do this for
dfmin = df[df.type.eq(1) & df.colour.eq('red')].reset_index()

# To each row merge all rows from the original DataFrame
dfmin = dfmin.merge(df[['date', 'minPixel']], how='cross')
# If pd.version < 1.2 instead use: 
#dfmin = dfmin.assign(t=1).merge(df[['date', 'minPixel']].assign(t=1), on='t')

# Only keep rows between the dates, then among those find the min minPixel
smin = (dfmin[dfmin.date_y.between(dfmin.date_x, dfmin.colourDate)]
            .groupby('index')['minPixel_y'].min()
            .rename('pixel_limit'))
#index
#1    14
#Name: pixel_limit, dtype: int64

# Max is basically a mirror
dfmax = df[df.type.eq(1) & df.colour.eq('blue')].reset_index()

dfmax = dfmax.merge(df[['date', 'maxPixel']], how='cross')
#dfmax = dfmax.assign(t=1).merge(df[['date', 'maxPixel']].assign(t=1), on='t')

smax = (dfmax[dfmax.date_y.between(dfmax.date_x, dfmax.colourDate)]
           .groupby('index')['maxPixel_y'].max()
           .rename('pixel_limit'))

最后，由于上述组位于原始索引（即'index'）之上，因此我们可以简单地分配回与原始数据帧对齐

df['pixel_limit'] = pd.concat([smin, smax])

                 date  type colour  maxPixel  minPixel          colourDate  pixel_limit
0 2019-08-06 09:00:00   0.0   blue       255        86 2019-08-06 12:00:00          NaN
1 2019-08-06 12:00:00   1.0    red      7346        96 2019-08-08 16:00:00         14.0
2 2019-08-06 18:00:00   NaN    NaN        32        14 2019-08-06 23:00:00          NaN
3 2019-08-06 21:00:00   1.0   blue      5184      3540 2019-08-06 22:00:00       5184.0
4 2019-08-07 09:00:00   NaN    NaN       600       528 2019-08-08 09:00:00          NaN
5 2019-08-07 16:00:00   NaN    NaN       322       300 2019-08-09 16:00:00          NaN
6 2019-08-08 17:00:00   0.0   blue        72        12 2019-08-08 23:00:00          NaN
7 2019-08-09 16:00:00   0.0    red      6000      4009 2019-08-11 16:00:00          NaN

如果您需要为具有最小/最大像素的行带来许多不同的信息，那么我们将对u值进行排序，然后gropuby+head或tail来获得最小或最大像素，而不是groupby{}。对于min，这看起来像（后缀的轻微重命名）：

# Subset of rows we need to do this for
dfmin = df[df.type.eq(1) & df.colour.eq('red')].reset_index()

# To each row merge all rows from the original DataFrame
dfmin = dfmin.merge(df[['date', 'minPixel']].reset_index(), how='cross', 
                    suffixes=['', '_match'])
# For older pandas < 1.2
#dfmin = (dfmin.assign(t=1)
#              .merge(df[['date', 'minPixel']].reset_index().assign(t=1), 
#                     on='t', suffixes=['', '_match'])) 

# Only keep rows between the dates, then among those find the min minPixel row. 
# A bunch of renaming. 
smin = (dfmin[dfmin.date_match.between(dfmin.date, dfmin.colourDate)]
            .sort_values('minPixel_match', ascending=True)
            .groupby('index').head(1)
            .set_index('index')
            .filter(like='_match')
            .rename(columns={'minPixel_match': 'pixel_limit'}))

然后，使用.tail将类似于Max

dfmax = df[df.type.eq(1) & df.colour.eq('blue')].reset_index()
dfmax = dfmax.merge(df[['date', 'maxPixel']].reset_index(), how='cross', 
                    suffixes=['', '_match'])

smax = (dfmax[dfmax.date_match.between(dfmax.date, dfmin.colourDate)]
            .sort_values('maxPixel_match', ascending=True)
            .groupby('index').tail(1)
            .set_index('index')
            .filter(like='_match')
            .rename(columns={'maxPixel_match': 'pixel_limit'}))

最后，我们继续axis=1，现在我们需要将多个列连接到原始列：

result = pd.concat([df, pd.concat([smin, smax])], axis=1)

                  date  type colour  maxPixel  minPixel           colourDate  index_match           date_match  pixel_limit
0  2019-08-06 09:00:00   0.0   blue       255        86  2019-08-06 12:00:00          NaN                  NaN          NaN
1  2019-08-06 12:00:00   1.0    red      7346        96  2019-08-08 16:00:00          2.0  2019-08-06 18:00:00         14.0
2  2019-08-06 18:00:00   NaN    NaN        32        14  2019-08-06 23:00:00          NaN                  NaN          NaN
3  2019-08-06 21:00:00   1.0   blue      5184      3540  2019-08-06 22:00:00          3.0  2019-08-06 21:00:00       5184.0
4  2019-08-07 09:00:00   NaN    NaN       600       528  2019-08-08 09:00:00          NaN                  NaN          NaN
5  2019-08-07 16:00:00   NaN    NaN       322       300  2019-08-09 16:00:00          NaN                  NaN          NaN
6  2019-08-08 17:00:00   0.0   blue        72        12  2019-08-08 23:00:00          NaN                  NaN          NaN
7  2019-08-09 16:00:00   0.0    red      6000      4009  2019-08-11 16:00:00          NaN                  NaN          NaN

相关问题更多 >

编程相关推荐

热门问题

热门文章