无法连接到docker spark master

2024-09-16 14:52:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个小spark 2.4.1群集在Docker上运行,无法使用pyspark连接。 主机是可访问的(在尝试连接时会写入一些有关失败的错误日志)

Docker编写文件:

  version: '3.4'
  services:
    #####################
    #       SPARK       #
    #####################

  master: #Master node for spark
    image: gettyimages/spark
    command: bin/spark-class org.apache.spark.deploy.master.Master -h master
    hostname: master
    environment:
      MASTER: spark://master:7077
      SPARK_CONF_DIR: /conf
      SPARK_PUBLIC_DNS: localhost
    expose:
      - 7001
      - 7002
      - 7003
      - 7004
      - 7005
      - 7077
      - 6066
    ports:
      - 4040:4040
      - 6066:6066
      - 7077:7077
      - 8080:8080
    volumes:
      - ./conf/master:/conf
      - ./data:/tmp/data      
      - ./win_utils:/usr/hadoop-3.0.0/bin

  worker: #Worker node for spark
    image: gettyimages/spark
    command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
    hostname: worker
    environment:
      SPARK_CONF_DIR: /conf
      SPARK_WORKER_CORES: 2
      SPARK_WORKER_MEMORY: 1g
      SPARK_WORKER_PORT: 8882
      SPARK_WORKER_WEBUI_PORT: 8082
      SPARK_PUBLIC_DNS: localhost
    links:
      - master
    expose:
      - 7012
      - 7013
      - 7014
      - 7015
      - 8881
    ports:
      - 8082:8082
    volumes:
      - ./conf/worker:/conf
      - ./data:/tmp/data
      - ./win_utils:/usr/hadoop-3.0.0/bin

我有Pyspark 2.4.1,请尝试连接:

from pyspark import SparkContext, SparkConf 
appName = "testsparkapp" 
master = "spark://localhost:7077" 
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf) 

Python错误日志:

Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
    at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:66)
    at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:464)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

Docker容器日志:

2020-06-19 06:07:15,847 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.

java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 1574364215946805297, local class serialVersionUID = 6543101073799644159

at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)

我读到的是版本的错配,但不确定什么是错配

Pyspark和Spark版本匹配Spark submit——版本返回2.4.1,Pyspark的pip冻结也是如此

我在本地机器(3.0.0)中使用的Spark版本是否重要?如果是这样,有Docker有什么意义

谢谢


Tags: dockerorgmasterdatabinapacheconfjava