PySpark:调用o51.showString时出错。没有名为XXX的模块

from pyspark.sql.types import FloatType from pyspark.sql.functions import udf def cast_to_float(y, column_name): return y.withColumn(column_name, y[column_name].cast(FloatType())) def cast_to_float_1(y, column_name): to_float = udf(cast2float1, FloatType()) return y.withColumn(column_name, to_float(column_name)) def cast2float1(a): return 1.0

from pyspark.sql import SparkSession import os import sys parentPath = os.path.abspath('..') if parentPath not in sys.path: sys.path.insert(0, parentPath) from cast_to_float import * spark = SparkSession.builder.appName("tests").getOrCreate() df = spark.createDataFrame([ (1, 1), (2, 2), (3, 3), ], ["ID", "VALUE"]) df1 = cast_to_float(df, 'ID') df2 = cast_to_float_1(df, 'ID') df1.show() df1.printSchema() df2.printSchema() df2.show()

+---+-----+ | ID|VALUE| +---+-----+ |1.0| 1| |2.0| 2| |3.0| 3| +---+-----+ root |-- ID: float (nullable = true) |-- VALUE: long (nullable = true) root |-- ID: float (nullable = true) |-- VALUE: long (nullable = true) Py4JJavaError Traceback (most recent call last) <ipython-input-4-86eb5df2f917> in <module>() 19 df1.printSchema() 20 df2.printSchema() ---> 21 df2.show() ... Py4JJavaError: An error occurred while calling o257.showString. ... ModuleNotFoundError: No module named 'cast_to_float' ...

1条回答

网友

1楼 · 发布于 2024-05-17 07:33:43

现在，结果将取决于调用脚本的工作目录。

如果你是根目录，这将添加它的父目录。您应该使用相对于__file__的路径（请参见what does the __file__ variable mean/do?）：

parentPath = os.path.join(
    os.path.abspath(os.path.dirname(__file__)), 
    os.path.pardir
)

但我建议使用适当的包结构。

注意：

这仅涵盖本地模式和驱动程序路径，即使在本地模式下，工作程序路径也不受驱动程序路径的影响。

若要处理执行器路径（更改后会出现执行器异常），仍应将模块分发给工作进程How to use custom classes with Apache Spark (pyspark)?。

spark = SparkSession.builder.appName("tests").getOrCreate()
spark.sparkContext.addPyFile("/path/to/cast_to_float.py")

相关问题更多 >

编程相关推荐

热门问题

热门文章