每个唯一列值和扩展评级的天数计数

2024-07-05 15:22:18 发布

您现在位置:Python中文网/ 问答频道 /正文

输入数据帧(示例)

Date    Location    Value
01-01-2020  Loc1    27.2
02-01-2020  Loc1    41.9
03-01-2020  Loc1    29.8
04-01-2020  Loc1    7.8
05-01-2020  Loc1    44
06-01-2020  Loc1    0.4
07-01-2020  Loc1    0.8
08-01-2020  Loc1    4.1
09-01-2020  Loc1    4
10-01-2020  Loc1    6.2
11-01-2020  Loc1    54.5
12-01-2020  Loc1    24.8
13-01-2020  Loc1    0
.
.
.
.
01-01-2020  Loc2    6
02-01-2020  Loc2    40.2
03-01-2020  Loc2    2.6
04-01-2020  Loc2    10.2
05-01-2020  Loc2    12
06-01-2020  Loc2    3.2
07-01-2020  Loc2    0
08-01-2020  Loc2    2.4
09-01-2020  Loc2    0
10-01-2020  Loc2    1.2
11-01-2020  Loc2    19.2
12-01-2020  Loc2    21.8
13-01-2020  Loc2    13.6
....

我想添加另一列“Rating”,使用以下逻辑填充

Rating  Condition
1       Less than 150 days of data

2       150 to 200 days

3       200 to 250 days

4       250 to 300 days

5       All 365 days

假设Loc1有180天的数据,因此每行的评级为2,类似地,Loc2有所有360天的数据,因此评级为5。因此,输出数据如下所示

Date    Location    Value   Rating
01-01-2021  Loc1    27.2    2
02-01-2021  Loc1    41.9    2
03-01-2021  Loc1    29.8    2
04-01-2021  Loc1    7.8     2
05-01-2021  Loc1    44      2
06-01-2021  Loc1    0.4     2
07-01-2021  Loc1    0.8     2
08-01-2021  Loc1    4.1     2
09-01-2021  Loc1    4       2
10-01-2021  Loc1    6.2     2
11-01-2021  Loc1    54.5    2
12-01-2021  Loc1    24.8    2
13-01-2021  Loc1    0       2
.
.
.
.
01-01-2021  Loc2    6       5
02-01-2021  Loc2    40.2    5
03-01-2021  Loc2    2.6     5
04-01-2021  Loc2    10.2    5
05-01-2021  Loc2    12      5
06-01-2021  Loc2    3.2     5
07-01-2021  Loc2    0       5
08-01-2021  Loc2    2.4     5
09-01-2021  Loc2    0       5
10-01-2021  Loc2    1.2     5
11-01-2021  Loc2    19.2    5
12-01-2021  Loc2    21.8    5
13-01-2021  Loc2    13.6    5
.
.

注:日期列是日期时间对象

我想对整个数据帧执行此操作,如何实现这一点


Tags: ofto数据示例datevaluelocation逻辑
2条回答

您可以通过将groupbytransform一起使用,然后将条件与np.where一起应用来实现:

df['Rating']=df.groupby('Location')['Date'].transform('count')
df['Rating']=np.where(df['Rating']<150,1,np.where(df['Rating']<200,2,np.where(df['Rating']<250,3,np.where(df['Rating']<300,4,np.where(df['Rating']==300,5,'')))))

您可以通过每个组的最大和最小日期时间找到差异,将时间增量转换为天,然后使用^{}进行装箱:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df['Days'] = df.groupby('Location')['Date'].transform(lambda x: x.max() - x.min()).dt.days

df['Rating'] = pd.cut(df['Days'], bins=[0, 150, 200, 250, 367], labels=False).add(1)

相关问题 更多 >