如何配置pyspark:Kryo序列化失败:缓冲区溢出

2024-09-29 23:20:34 发布

您现在位置:Python中文网/ 问答频道 /正文

原因:org.apache.spark.SparkException:Kryo序列化失败:缓冲区溢出。可用:0,必需:536870912。要避免这种情况,请增加spark.kryoserializer.buffer.max值

我已将配置更改如下:

sparkSession = SparkSession.builder \
    .appName("recommend result") \
    .config("spark.executor.memory", "32G") \
    .config("spark.driver.memory", "32G") \
    .config("spark.python.worker.memory", "32G") \
    .config("spark.default.parallelism", "80") \
    .config("spark.executor.cores", "8") \
    .config("spark.sql.shuffle.partitions", "500") \
    .config("spark.sql.crossJoin.enabled", "true") \
    .config("spark.sql.broadcastTimeout", "36000") \
    .config("spark.sql.hive.mergeFiles", "true") \
    .config("spark.speculation", "false") \
    .config("spark.hadoop.hive.exec.dynamic.partition", "true") \
    .config("spark.hadoop.hive.exec.dynamic.partition.mode", "nonstrict") \
    .config("spark.yarn.appMasterEnv.yarn.nodemanager.container-executor.class", "DockerLinuxContainer") \
    .config("spark.executorEnv.yarn.nodemanager.container-executor.class", "DockerLinuxContainer") \
    .config("spark.yarn.appMasterEnv.yarn.nodemanager.docker-container-executor.image-name",
            "bdp-docker.jd.com:5000/wise_mart_bag:latest") \
    .config("spark.executorEnv.yarn.nodemanager.docker-container-executor.image-name",
            "bdp-docker.jd.com:5000/wise_mart_bag:latest") \
    .config("spark.kryoserializer.buffer.max", "512m") \
    .config("spark.kryoserializer.buffer.max.mb", "512")
    .enableHiveSupport() \
    .getOrCreate()

我的代码从

spark-submit \
--master yarn-cluster \
--num-executors 100 \
--executor-memory 10G \
--driver-memory 10G \
--conf spark.yarn.appMasterEnv.yarn.nodemanager.container-executor.class=DockerLinuxContainer \
--conf spark.executorEnv.yarn.nodemanager.container-executor.class=DockerLinuxContainer \
--conf spark.yarn.appMasterEnv.yarn.nodemanager.docker-container-executor.image-name=bdp-docker.jd.com:5000/wise_mart_jypt:latest \
--conf spark.executorEnv.yarn.nodemanager.docker-container-executor.image-name=bdp-docker.jd.com:5000/wise_mart_jypt:latest \
--conf  spark.kryoserializer.buffer.max=512m \
--config spark.kryoserializer.buffer.max.mb=512 \
--conf spark.pyspark.python=python3 \
--files $HIVE_CONF_DIR/hive-site.xml fpgrowth_agg_fp_freq_coupon_set.py ${BUFFALO_ENV_BCYCLE}

echo "work done!"

然而,这个问题仍然存在。有人能告诉我这里有什么问题吗?不确定我是否在设置属性时出错。谢谢


Tags: dockerconfigsqlcontainerconfbuffermaxspark

热门问题