我有一个小spark 2.4.1群集在Docker上运行,无法使用pyspark连接。 主机是可访问的(在尝试连接时会写入一些有关失败的错误日志)
Docker编写文件:
version: '3.4'
services:
#####################
# SPARK #
#####################
master: #Master node for spark
image: gettyimages/spark
command: bin/spark-class org.apache.spark.deploy.master.Master -h master
hostname: master
environment:
MASTER: spark://master:7077
SPARK_CONF_DIR: /conf
SPARK_PUBLIC_DNS: localhost
expose:
- 7001
- 7002
- 7003
- 7004
- 7005
- 7077
- 6066
ports:
- 4040:4040
- 6066:6066
- 7077:7077
- 8080:8080
volumes:
- ./conf/master:/conf
- ./data:/tmp/data
- ./win_utils:/usr/hadoop-3.0.0/bin
worker: #Worker node for spark
image: gettyimages/spark
command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
hostname: worker
environment:
SPARK_CONF_DIR: /conf
SPARK_WORKER_CORES: 2
SPARK_WORKER_MEMORY: 1g
SPARK_WORKER_PORT: 8882
SPARK_WORKER_WEBUI_PORT: 8082
SPARK_PUBLIC_DNS: localhost
links:
- master
expose:
- 7012
- 7013
- 7014
- 7015
- 8881
ports:
- 8082:8082
volumes:
- ./conf/worker:/conf
- ./data:/tmp/data
- ./win_utils:/usr/hadoop-3.0.0/bin
我有Pyspark 2.4.1,请尝试连接:
from pyspark import SparkContext, SparkConf
appName = "testsparkapp"
master = "spark://localhost:7077"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
Python错误日志:
Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.NullPointerException
at org.apache.spark.storage.BlockManagerMaster.registerBlockManager(BlockManagerMaster.scala:66)
at org.apache.spark.storage.BlockManager.initialize(BlockManager.scala:464)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:579)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
Docker容器日志:
2020-06-19 06:07:15,847 ERROR server.TransportRequestHandler: Error while invoking RpcHandler#receive() for one-way message.
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = 1574364215946805297, local class serialVersionUID = 6543101073799644159
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1885)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1751)
我读到的是版本的错配,但不确定什么是错配
Pyspark和Spark版本匹配Spark submit——版本返回2.4.1,Pyspark的pip冻结也是如此
我在本地机器(3.0.0)中使用的Spark版本是否重要?如果是这样,有Docker有什么意义
谢谢
好吧,显然是的,这很重要。 Pyspark和Spark版本必须匹配,这有助于:
Can PySpark work without Spark?
相关问题 更多 >
编程相关推荐