从最近的行而不是最早的行获取数据

2024-09-24 02:24:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,如下所示:

ptid,blast_date,test_name,result_date,test_result,date_diff
PT085087309 2013-10-03  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-04  Influenza A.H1.respiratory.qualitative  2013-10-04  not detected    0
PT085087309 2013-10-07  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    2
PT085087309 2013-10-09  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    4
PT085087309 2013-10-14  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    9
PT085087309 2013-10-15  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    10
PT085087309 2013-10-18  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    13
PT085087309 2013-10-21  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-23  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-24  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-25  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-27  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-28  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-31  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-01  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-04  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-06  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-08  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-11  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-14  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-11-15  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    0
PT085087309 2013-11-18  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    3
PT085087309 2013-11-19  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    4
PT085087309 2013-11-21  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    6
PT085087309 2014-09-29  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    8
PT085087309 2014-09-30  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    9
PT085087309 2014-10-01  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    10
PT085087309 2014-10-02  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    11
PT085087309 2014-10-03  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    12
PT085087309 2014-10-04  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    13
PT085087309 2014-10-06  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2014-10-07  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2014-10-09  Influenza A.H1.respiratory.qualitative  NA  NA  0

我希望上面的数据框检查以前的行,其中ever result和result_date列为NA,并从最接近“date”的前几行中获取数据,并填充它而不是NA

生成的数据帧:

ptid,blast_date,test_name,result_date,test_result,date_diff
PT085087309 2013-10-03  Influenza A.H1.respiratory.qualitative  NA  NA  0
PT085087309 2013-10-04  Influenza A.H1.respiratory.qualitative  2013-10-04  not detected    0
PT085087309 2013-10-07  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    2
PT085087309 2013-10-09  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    4
PT085087309 2013-10-14  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    9
PT085087309 2013-10-15  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    10
PT085087309 2013-10-18  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    13
PT085087309 2013-10-21  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-10-23  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-10-24  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-10-25  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-10-27  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-10-28  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-10-31  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-01  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-04  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-06  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-08  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-11  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-14  Influenza A.H1.respiratory.qualitative  2013-10-05  not detected    0
PT085087309 2013-11-15  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    0
PT085087309 2013-11-18  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    3
PT085087309 2013-11-19  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    4
PT085087309 2013-11-21  Influenza A.H1.respiratory.qualitative  2013-11-15  not detected    6
PT085087309 2014-09-29  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    8
PT085087309 2014-09-30  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    9
PT085087309 2014-10-01  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    10
PT085087309 2014-10-02  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    11
PT085087309 2014-10-03  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    12
PT085087309 2014-10-04  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    13
PT085087309 2014-10-06  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    0
PT085087309 2014-10-07  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    0
PT085087309 2014-10-09  Influenza A.H1.respiratory.qualitative  2014-09-21  not detected    0

我有以下代码,但我从最早的日期获得结果数据。你们能告诉我如何更改下面的代码,以便从最近的日期行获得结果数据吗

w=Window().partitionBy("ptid","test_name").orderBy("blast_date")

df_tests_filled = df_all_tests.withColumn("collect", f.collect_list(f.array("result_date","test_result")).over(w))\
  .withColumn("collect", f.expr("""filter(collect,x-> array_contains(x,'NA')!=True)""")[0])\
  .withColumn("result_date", f.when((f.col("result_date")=='NA')&(f.col("collect").isNotNull()),f.col("collect")[0]).otherwise(f.col("result_date")))\
  .withColumn("test_result", f.when((f.col("test_result")=='NA')&(f.col("collect").isNotNull()),f.col("collect")[1]).otherwise(f.col("test_result"))).drop("timestamp","collect")

Tags: 数据testdatenotcolresulth1collect