我试图让一个窗口函数返回并在特定日期前获取前一行,但不确定出了什么问题,但它给我的是前一行,而不是指定的日期行。为了计算这一点,我取当前行的日期,并查找与该周相关的当前星期一,如下所示
def previous_day(date, dayOfWeek):
return date_sub(next_day(date, "monday"), 7)
spark_df = spark_df.withColumn("last_monday", previous_day(spark_df['calendarday'], "monday"))
然后,我计算当前日期与最近的前一个星期一之间的差值,以天为单位
d = F.datediff(spark_df['calendarday'], spark_df['last_monday'])
spark_df = spark_df.withColumn("daysSinceMonday",d)
我可以从daysSinceMonday中看出每行的值是正确的。接下来,我想创建一个窗口,并选择第一行,但通过我设置的d值来设置它们的范围,但由于某些原因,它不起作用
days = lambda i: i * 86400
w = (Window.partitionBy(column_list).orderBy(col('calendarday').cast("timestamp").cast("long")).rangeBetween(-days(d), 0))
spark_df = spark_df.withColumn('PreviousYearUnique', first("indexCP").over(w))
Starting Data Frame
## +---+-----------+-----------+--------+
## | id|calendarday|last_monday| indexCP|
## +---+-----------+-----------+--------+
## | 1|2015-01-05 | 2015-01-05| 0.0076|
## | 1|2015-01-06 | 2015-01-05| 0.0026|
## | 1|2015-01-07 | 2015-01-05| 0.0016|
## | 1|2015-01-08 | 2015-01-05| 0.0006|
## | 2|2015-01-09 | 2015-01-05| 0.0012|
## | 2|2015-01-10 | 2015-01-05| 0.0014|
## | 1|2015-01-12 | 2015-01-12| 0.0026|
## | 1|2015-01-13 | 2015-01-12| 0.0086|
## | 1|2015-01-14 | 2015-01-12| 0.0046|
## | 1|2015-01-15 | 2015-01-12| 0.0021|
## | 2|2015-01-16 | 2015-01-12| 0.0042|
## | 2|2015-01-17 | 2015-01-12| 0.0099|
## +---+-----------+-----------+--------+
New Data Frame Adding Previous last_mondays row indexCP as PreviousYearUnique
## +---+-----------+-----------+--------+--------------------+
## | id|calendarday|last_monday| indexCP| PreviousYearUnique |
## +---+-----------+-----------+--------+--------------------+
## | 1|2015-01-05 | 2015-01-05| 0.0076| 0.0076|
## | 1|2015-01-06 | 2015-01-05| 0.0026| 0.0076|
## | 1|2015-01-07 | 2015-01-05| 0.0016| 0.0076|
## | 1|2015-01-08 | 2015-01-05| 0.0006| 0.0076|
## | 2|2015-01-09 | 2015-01-05| 0.0012| 0.0076|
## | 2|2015-01-10 | 2015-01-05| 0.0014| 0.0076|
## | 1|2015-01-12 | 2015-01-12| 0.0026| 0.0026|
## | 1|2015-01-13 | 2015-01-12| 0.0086| 0.0026|
## | 1|2015-01-14 | 2015-01-12| 0.0046| 0.0026|
## | 1|2015-01-15 | 2015-01-12| 0.0021| 0.0026|
## | 2|2015-01-16 | 2015-01-12| 0.0042| 0.0026|
## | 2|2015-01-17 | 2015-01-12| 0.0099| 0.0026|
## +---+-----------+-----------+--------+--------------------+
你知道怎么回事吗
您可以在
unboundedPreceding
窗口上last_monday
覆盖calendarday
,然后使用first
相关问题 更多 >
编程相关推荐