查找满足python条件的特定值

2024-06-28 16:20:25 发布

您现在位置:Python中文网/ 问答频道 /正文

尝试使用满足特定条件的值创建新列。下面我列出了一些代码,这些代码在某种程度上解释了逻辑,但没有产生正确的输出:

import pandas as pd
import numpy as np


df = pd.DataFrame({'date': ['2019-08-06 09:00:00', '2019-08-06 12:00:00', '2019-08-06 18:00:00', '2019-08-06 21:00:00', '2019-08-07 09:00:00', '2019-08-07 16:00:00', '2019-08-08 17:00:00' ,'2019-08-09 16:00:00'], 
                'type': [0, 1, np.nan, 1, np.nan, np.nan, 0 ,0], 
                'colour': ['blue', 'red', np.nan, 'blue', np.nan, np.nan, 'blue', 'red'],
                'maxPixel': [255, 7346, 32, 5184, 600, 322, 72, 6000],
                'minPixel': [86, 96, 14, 3540, 528, 300, 12, 4009],
                'colourDate': ['2019-08-06 12:00:00', '2019-08-08 16:00:00', '2019-08-06 23:00:00', '2019-08-06 22:00:00', '2019-08-08 09:00:00', '2019-08-09 16:00:00', '2019-08-08 23:00:00' ,'2019-08-11 16:00:00'] })

max_conditions = [(df['type'] == 1) & (df['colour'] == 'blue'),
                  (df['type'] == 1) & (df['colour'] == 'red')]


max_choices = [np.where(df['date'] <= df['colourDate'], max(df['maxPixel']), np.nan),
                np.where(df['date'] <= df['colourDate'], min(df['minPixel']), np.nan)]


df['pixelLimit'] = np.select(max_conditions, max_choices, default=np.nan)

输出不正确:

                  date  type colour  maxPixel  minPixel           colourDate  pixelLimit
0  2019-08-06 09:00:00   0.0   blue       255        86  2019-08-06 12:00:00         NaN
1  2019-08-06 12:00:00   1.0    red      7346        96  2019-08-08 16:00:00        12.0
2  2019-08-06 18:00:00   NaN    NaN        32        14  2019-08-06 23:00:00         NaN
3  2019-08-06 21:00:00   1.0   blue      5184      3540  2019-08-06 22:00:00      6000.0
4  2019-08-07 09:00:00   NaN    NaN       600       528  2019-08-08 09:00:00         NaN
5  2019-08-07 16:00:00   NaN    NaN       322       300  2019-08-09 16:00:00         NaN
6  2019-08-08 17:00:00   0.0   blue        72        12  2019-08-08 23:00:00         NaN
7  2019-08-09 16:00:00   0.0    red      6000      4009  2019-08-11 16:00:00         NaN

解释输出不正确的原因:

df['pixelLimit']的索引行1中的值12.0不正确,因为该值来自df['minPixel']索引行6,该行的df['date']日期时间大于索引行1中包含的2019-08-08 16:00:00{}日期时间

df['pixelLimit']的索引行3中的值6000.0不正确,因为该值来自df['maxPixel']索引行7,该行的df['date']日期时间大于索引行中包含的2019-08-06 22:00:00{}日期时间

正确输出:

                  date  type colour  maxPixel  minPixel           colourDate  pixelLimit
0  2019-08-06 09:00:00   0.0   blue       255        86  2019-08-06 12:00:00         NaN
1  2019-08-06 12:00:00   1.0    red      7346        96  2019-08-08 16:00:00        14.0
2  2019-08-06 18:00:00   NaN    NaN        32        14  2019-08-06 23:00:00         NaN
3  2019-08-06 21:00:00   1.0   blue      5184      3540  2019-08-06 22:00:00      5184.0
4  2019-08-07 09:00:00   NaN    NaN       600       528  2019-08-08 09:00:00         NaN
5  2019-08-07 16:00:00   NaN    NaN       322       300  2019-08-09 16:00:00         NaN
6  2019-08-08 17:00:00   0.0   blue        72        12  2019-08-08 23:00:00         NaN
7  2019-08-09 16:00:00   0.0    red      6000      4009  2019-08-11 16:00:00         NaN

解释为什么输出正确:

df['pixelLimit']的索引行1中的值14.0是正确的,因为我们正在查找列df['minPixel']中的最小值,该列df['date']中的datetime小于列df['colourDate']的索引行1中的datetime,并且大于或等于列df['date']的索引行1中的datetime

df['pixelLimit']的索引行3中的值5184.0是正确的,因为我们正在查找列df['maxPixel']中的最大值,该列df['date']中的datetime小于列df['colourDate']的索引行3中的datetime,并且大于或等于列df['date']的索引行3中的datetime

注意事项:

也许np.select不适合此任务,而某种函数可能更好地服务于此任务

另外,也许我需要创建某种动态len作为每行的起点

请求

请外面有人能帮我修改代码以获得正确的输出吗


Tags: dfdatetimedatetypenpbluerednan
1条回答
网友
1楼 · 发布于 2024-06-28 16:20:25

对于像这样的匹配问题,一种可能是进行完全合并,然后使用布尔级数对满足条件的所有行(该行)进行子集,并在所有可能的匹配中找到maxmin。因为这需要稍微不同的列和不同的函数,所以我将操作分为两段非常相似的代码,一段处理1/blue,另一段处理1/red

首先做些家务,让事情准时进行

import pandas as pd

df['date'] = pd.to_datetime(df['date'])
df['colourDate'] = pd.to_datetime(df['colourDate'])

计算每行时间之间1/red的最小像素

# Subset of rows we need to do this for
dfmin = df[df.type.eq(1) & df.colour.eq('red')].reset_index()

# To each row merge all rows from the original DataFrame
dfmin = dfmin.merge(df[['date', 'minPixel']], how='cross')
# If pd.version < 1.2 instead use: 
#dfmin = dfmin.assign(t=1).merge(df[['date', 'minPixel']].assign(t=1), on='t')

# Only keep rows between the dates, then among those find the min minPixel
smin = (dfmin[dfmin.date_y.between(dfmin.date_x, dfmin.colourDate)]
            .groupby('index')['minPixel_y'].min()
            .rename('pixel_limit'))
#index
#1    14
#Name: pixel_limit, dtype: int64

# Max is basically a mirror
dfmax = df[df.type.eq(1) & df.colour.eq('blue')].reset_index()

dfmax = dfmax.merge(df[['date', 'maxPixel']], how='cross')
#dfmax = dfmax.assign(t=1).merge(df[['date', 'maxPixel']].assign(t=1), on='t')

smax = (dfmax[dfmax.date_y.between(dfmax.date_x, dfmax.colourDate)]
           .groupby('index')['maxPixel_y'].max()
           .rename('pixel_limit'))

最后,由于上述组位于原始索引(即'index')之上,因此我们可以简单地分配回与原始数据帧对齐

df['pixel_limit'] = pd.concat([smin, smax])

                 date  type colour  maxPixel  minPixel          colourDate  pixel_limit
0 2019-08-06 09:00:00   0.0   blue       255        86 2019-08-06 12:00:00          NaN
1 2019-08-06 12:00:00   1.0    red      7346        96 2019-08-08 16:00:00         14.0
2 2019-08-06 18:00:00   NaN    NaN        32        14 2019-08-06 23:00:00          NaN
3 2019-08-06 21:00:00   1.0   blue      5184      3540 2019-08-06 22:00:00       5184.0
4 2019-08-07 09:00:00   NaN    NaN       600       528 2019-08-08 09:00:00          NaN
5 2019-08-07 16:00:00   NaN    NaN       322       300 2019-08-09 16:00:00          NaN
6 2019-08-08 17:00:00   0.0   blue        72        12 2019-08-08 23:00:00          NaN
7 2019-08-09 16:00:00   0.0    red      6000      4009 2019-08-11 16:00:00          NaN

如果您需要为具有最小/最大像素的行带来许多不同的信息,那么我们将对u值进行排序,然后gropuby+headtail来获得最小或最大像素,而不是groupby{}。对于min,这看起来像(后缀的轻微重命名):

# Subset of rows we need to do this for
dfmin = df[df.type.eq(1) & df.colour.eq('red')].reset_index()

# To each row merge all rows from the original DataFrame
dfmin = dfmin.merge(df[['date', 'minPixel']].reset_index(), how='cross', 
                    suffixes=['', '_match'])
# For older pandas < 1.2
#dfmin = (dfmin.assign(t=1)
#              .merge(df[['date', 'minPixel']].reset_index().assign(t=1), 
#                     on='t', suffixes=['', '_match'])) 

# Only keep rows between the dates, then among those find the min minPixel row. 
# A bunch of renaming. 
smin = (dfmin[dfmin.date_match.between(dfmin.date, dfmin.colourDate)]
            .sort_values('minPixel_match', ascending=True)
            .groupby('index').head(1)
            .set_index('index')
            .filter(like='_match')
            .rename(columns={'minPixel_match': 'pixel_limit'}))

然后,使用.tail将类似于Max

dfmax = df[df.type.eq(1) & df.colour.eq('blue')].reset_index()
dfmax = dfmax.merge(df[['date', 'maxPixel']].reset_index(), how='cross', 
                    suffixes=['', '_match'])

smax = (dfmax[dfmax.date_match.between(dfmax.date, dfmin.colourDate)]
            .sort_values('maxPixel_match', ascending=True)
            .groupby('index').tail(1)
            .set_index('index')
            .filter(like='_match')
            .rename(columns={'maxPixel_match': 'pixel_limit'}))

最后,我们继续axis=1,现在我们需要将多个列连接到原始列:

result = pd.concat([df, pd.concat([smin, smax])], axis=1)

                  date  type colour  maxPixel  minPixel           colourDate  index_match           date_match  pixel_limit
0  2019-08-06 09:00:00   0.0   blue       255        86  2019-08-06 12:00:00          NaN                  NaN          NaN
1  2019-08-06 12:00:00   1.0    red      7346        96  2019-08-08 16:00:00          2.0  2019-08-06 18:00:00         14.0
2  2019-08-06 18:00:00   NaN    NaN        32        14  2019-08-06 23:00:00          NaN                  NaN          NaN
3  2019-08-06 21:00:00   1.0   blue      5184      3540  2019-08-06 22:00:00          3.0  2019-08-06 21:00:00       5184.0
4  2019-08-07 09:00:00   NaN    NaN       600       528  2019-08-08 09:00:00          NaN                  NaN          NaN
5  2019-08-07 16:00:00   NaN    NaN       322       300  2019-08-09 16:00:00          NaN                  NaN          NaN
6  2019-08-08 17:00:00   0.0   blue        72        12  2019-08-08 23:00:00          NaN                  NaN          NaN
7  2019-08-09 16:00:00   0.0    red      6000      4009  2019-08-11 16:00:00          NaN                  NaN          NaN

相关问题 更多 >