AnalysisException,pyspark无法解析数据帧查询内的变量

2024-09-28 01:30:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我这里有一行pyspark脚本

df_output = df.select("*",$checkcol) 
df_output.show()

通过对变量进行硬编码, 但当参数化时会出现一个错误

pyspark.sql.utils.AnalysisException: 'cannot resolve \'`"*", F.....

其中checkcol是一个值如下的变量

复选框-

F.when(F.col("colA")=='null',"Yes").otherwise(date_validation_udf("colA")).alias("colA_DateCheck"),
F.when(F.col("colB")=='null',"Yes").otherwise(date_validation_udf("colB")).alias("colB_DateCheck"),F.when(F.col("colC")=='null',"Yes").otherwise(date_validation_udf("colC")).alias("colC_DateCheck"),
F.when(F.col("colD")=='null',"Yes").otherwise(num_check_udf("colD")).alias("colD_NumCheck"),F.when(F.col("colE")=='null',"Yes").otherwise(num_check_udf("colE")).alias("colE_NumCheck"),
F.when(F.col("colF")=='null',"Yes").otherwise(num_check_udf("colF")).alias("colF_NumCheck"),F.when(F.col("colG")=='null',"Yes").otherwise(num_check_udf("colG")).alias("colG_NumCheck")

Tags: dfdatecheckaliascolnullnumyes
1条回答
网友
1楼 · 发布于 2024-09-28 01:30:51

试试这个:

import pyspark.sql.functions as F

df_output = df.withColumn("colA",
                          F.when(F.col("colA")=='null',"Yes").otherwise(date_validation_udf("colA")).alias("colA_DateCheck"))
              .withColumn("colB",
                          F.when(F.col("colB")=='null',"Yes").otherwise(date_validation_udf("colB")).alias("colB_DateCheck"),F.when(F.col("colC")=='null',"Yes").otherwise(date_validation_udf("colC")).alias("colC_DateCheck"),)
              ...

df_output.show()

编辑:

要将这些语句作为一个要选择的变量传递,请尝试以下操作:

checkcol = (F.when(F.col("colA") == 'null', "Yes").otherwise(date_validation_udf("colA")).alias("colA_DateCheck"),
            F.when(F.col("colB") == 'null', "Yes").otherwise(date_validation_udf("colB")).alias("colB_DateCheck"),
            F.when(F.col("colC") == 'null', "Yes").otherwise(date_validation_udf("colC")).alias("colC_DateCheck"),
            F.when(F.col("colD") == 'null', "Yes").otherwise(num_check_udf("colD")).alias("colD_NumCheck"),
            F.when(F.col("colE") == 'null', "Yes").otherwise(num_check_udf("colE")).alias("colE_NumCheck"),
            F.when(F.col("colF") == 'null', "Yes").otherwise(num_check_udf("colF")).alias("colF_NumCheck"),
            F.when(F.col("colG") == 'null', "Yes").otherwise(num_check_udf("colG")).alias("colG_NumCheck"))


df_output =  df.select(
          '*', 
           *checkcol
)

相关问题 更多 >

    热门问题