Pyspark错误:数据类型<类'pyspark.sql.types.StringType'>应该是<class'的实例pyspark.sql.types.数据类型'>

2024-06-01 23:22:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从pipelinedRDD中提取一些数据,但在将其转换为Dataframe时,会出现以下错误:

Traceback (most recent call last):

  File "/home/karan/Desktop/meds.py", line 42, in <module>

    relevantToSymEntered(newrdd)

  File "/home/karan/Desktop/meds.py", line 26, in relevantToSymEntered

    mat = spark.createDataFrame(self,StructType([StructField("Prescribed 

medicine",StringType), StructField(["Disease","ID","Symptoms 

Recorded","Severeness"],ArrayType)]))

  File "/home/karan/Downloads/spark-2.4.2-bin-

hadoop2.7/python/pyspark/sql/types.py", line 409, in __init__

    "dataType %s should be an instance of %s" % (dataType, DataType)

AssertionError: dataType <class 'pyspark.sql.types.StringType'> should be an 
instance of <class 'pyspark.sql.types.DataType'>

1。我的错误是不同类型的,它是TypeError,而我遇到了AssertionError的问题。在

  1. 我的问题与数据类型的转换无关。在

我已经尝试过使用toDF(),但是它改变了列名,这是不可取的。在

^{pr2}$

Tags: inpyhomesql错误linesparkpyspark