我有一个数据框,如下所示:
ptid,blast_date,test_name,result_date,test_result,date_diff
PT085087309 2013-10-03 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-04 Influenza A.H1.respiratory.qualitative 2013-10-04 not detected 0
PT085087309 2013-10-07 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 2
PT085087309 2013-10-09 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 4
PT085087309 2013-10-14 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 9
PT085087309 2013-10-15 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 10
PT085087309 2013-10-18 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 13
PT085087309 2013-10-21 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-23 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-24 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-25 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-27 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-28 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-31 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-01 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-04 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-06 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-08 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-11 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-14 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-11-15 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 0
PT085087309 2013-11-18 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 3
PT085087309 2013-11-19 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 4
PT085087309 2013-11-21 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 6
PT085087309 2014-09-29 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 8
PT085087309 2014-09-30 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 9
PT085087309 2014-10-01 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 10
PT085087309 2014-10-02 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 11
PT085087309 2014-10-03 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 12
PT085087309 2014-10-04 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 13
PT085087309 2014-10-06 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2014-10-07 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2014-10-09 Influenza A.H1.respiratory.qualitative NA NA 0
我希望上面的数据框检查以前的行,其中ever result和result_date列为NA,并从最接近“date”的前几行中获取数据,并填充它而不是NA
生成的数据帧:
ptid,blast_date,test_name,result_date,test_result,date_diff
PT085087309 2013-10-03 Influenza A.H1.respiratory.qualitative NA NA 0
PT085087309 2013-10-04 Influenza A.H1.respiratory.qualitative 2013-10-04 not detected 0
PT085087309 2013-10-07 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 2
PT085087309 2013-10-09 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 4
PT085087309 2013-10-14 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 9
PT085087309 2013-10-15 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 10
PT085087309 2013-10-18 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 13
PT085087309 2013-10-21 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-10-23 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-10-24 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-10-25 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-10-27 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-10-28 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-10-31 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-01 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-04 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-06 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-08 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-11 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-14 Influenza A.H1.respiratory.qualitative 2013-10-05 not detected 0
PT085087309 2013-11-15 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 0
PT085087309 2013-11-18 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 3
PT085087309 2013-11-19 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 4
PT085087309 2013-11-21 Influenza A.H1.respiratory.qualitative 2013-11-15 not detected 6
PT085087309 2014-09-29 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 8
PT085087309 2014-09-30 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 9
PT085087309 2014-10-01 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 10
PT085087309 2014-10-02 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 11
PT085087309 2014-10-03 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 12
PT085087309 2014-10-04 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 13
PT085087309 2014-10-06 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 0
PT085087309 2014-10-07 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 0
PT085087309 2014-10-09 Influenza A.H1.respiratory.qualitative 2014-09-21 not detected 0
我有以下代码,但我从最早的日期获得结果数据。你们能告诉我如何更改下面的代码,以便从最近的日期行获得结果数据吗
w=Window().partitionBy("ptid","test_name").orderBy("blast_date")
df_tests_filled = df_all_tests.withColumn("collect", f.collect_list(f.array("result_date","test_result")).over(w))\
.withColumn("collect", f.expr("""filter(collect,x-> array_contains(x,'NA')!=True)""")[0])\
.withColumn("result_date", f.when((f.col("result_date")=='NA')&(f.col("collect").isNotNull()),f.col("collect")[0]).otherwise(f.col("result_date")))\
.withColumn("test_result", f.when((f.col("test_result")=='NA')&(f.col("collect").isNotNull()),f.col("collect")[1]).otherwise(f.col("test_result"))).drop("timestamp","collect")
我想这就是你想要的:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
相关问题 更多 >
编程相关推荐