如何解决运行时错误：找不到graphframes

19/06/05 02:22:17 ERROR Executor: Exception in task 641.3 in stage 216.0 (TID 123244) org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/worker.py", line 166, in main func, profiler, deserializer, serializer = read_command(pickleSer, infile) File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/worker.py", line 55, in read_command command = serializer._read_with_length(file) File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length return self.loads(obj) File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/serializers.py", line 455, in loads return pickle.loads(obj, encoding=encoding) File "/data/data08/nm-local-dir/usercache/hduser0011/appcache/application_1547810698423_82435/container_1547810698423_82435_02_000041/ares_detect.zip/ares_detect/task/communication_detect.py", line 11, in <module> from graphframes import GraphFrame ModuleNotFoundError: No module named 'graphframes'

spark-submit --master yarn-cluster \ --name ad_com_detect_${app_arr[$i]}_${scenario_arr[$i]}_${txParameter_app_arr[$i]} \ --executor-cores 4 \ --num-executors 8 \ --executor-memory 35g \ --driver-memory 2g \ --conf spark.sql.shuffle.partitions=800 \ --conf spark.default.parallelism=1000 \ --conf spark.yarn.executor.memoryOverhead=2048 \ --conf spark.sql.execution.arrow.enabled=true \ --jars org.scala-lang_scala-reflect-2.10.4.jar,\ org.slf4j_slf4j-api-1.7.7.jar,\ com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,\ com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,\ graphframes-0.6.0-spark2.2-s_2.11.jar \ --py-files ***.zip \ ***/***/****.py &

1条回答

网友

1楼 · 发布于 2024-09-30 03:23:02

尝试通过package命令添加jar。你知道吗

spark-submit \
     packages graphframes:graphframes:0.7.0-spark2.4-s_2.11  \
      my_py_script.py

它还可以同时处理这两个参数

spark-submit \
     packages graphframes:graphframes:0.7.0-spark2.4-s_2.11  \
     jars patth_to_your_jars/graphframes-0.7.0-spark2.4-s_2.11.jar \
    my_py_script.py

这为我解决了问题

一般来说，有4个命令可以添加文件来激发火花，这些命令在 spark-submit help

 jars JARS            Comma-separated list of jars to include on the driver and executor classpaths.

 packages             Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by  repositories.

 py-files PY_FILES    Comma-separated list of .zip, .egg, or .pyfiles to place on the PYTHONPATH for Python apps.

 files FILES          Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).

相关问题更多 >

编程相关推荐

热门问题

热门文章