在PySp中使用多处理时，sparkSession出错

2024-06-28 19:13:25 发布

男 | 程序猿一只，喜欢编程写python代码。

我的代码如下：

def processFiles(prcFile , spark:SparkSession):
    print(prcFile)
    app_id = spark.sparkContext.getConf().get('spark.app.id')
    app_name = spark.sparkContext.getConf().get('spark.app.name')
    print(app_id)
    print(app_name)
def main(configPath,args):
    config.read(configPath)
    spark: SparkSession = pyspark.sql.SparkSession.builder.appName("multiprocessing").enableHiveSupport().getOrCreate()    
    mprc = multiprocessing.Pool(3)
    lst=glob.glob(config.get('DIT_setup_config', 'prcDetails')+'prc_PrcId_[0-9].json')
    mprc.map(processFiles,zip(lst, repeat(spark.newSession())))

现在我想通过一个新的火花会议(spark.newSession公司（））并相应地处理数据，但我收到一个错误，它说：

Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.

任何帮助都是值得赞赏的

Tags： name id config app get def multiprocessing spark

0条回答

目前没有回答

在PySp中使用多处理时，sparkSession出错

相关问题更多 >

编程相关推荐

热门问题

热门文章

在PySp中使用多处理时，sparkSession出错

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >