我无法将binningudf传递给Pydequ的直方图分析器。谁能帮忙吗
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType, IntegerType
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
columns = ["Seqno","Name"]
data = [("1", "john jones"),
("3", "tracey smith"),
("3", "amy sanders"),("2", "amy sanders"),("4", "amy sanders"),("5", "amy sanders")]
df = spark.createDataFrame(data=data,schema=columns)
def createBucket(x):
if int(x)>2: return 1
else: return 2
createBucketUdf = udf(lambda z: createBucket(z), IntegerType())
analysisResult = AnalysisRunner(spark).onData(df) \
.addAnalyzer(Histogram("SeqNo",binningUdf=lambda z: createBucket(z))) \
.run()
result_df = AnalyzerContext.successMetricsAsDataFrame(spark, analysisResult)
result_df.show()
上面的代码抛出一个错误
AttributeError: 'function' object has no attribute '_get_object_id'
上述错误似乎是由于参数类型不正确造成的。所以我试着把lambda当作绳子。但后来我得到了下面的回应
+------+--------+----+-----+
|entity|instance|name|value|
+------+--------+----+-----+
+------+--------+----+-----+
目前没有回答
相关问题 更多 >
编程相关推荐