如何解决运行时错误:找不到graphframes

2024-09-30 03:23:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我在pyspark中使用了graphframes框架,这在一段时间内是正常运行的(我使用了graphframes模块),但过了一段时间后,我得到了一个错误:“没有名为‘graphframes’的模块”。你知道吗

这种错误是偶尔发生的,有时他能完成运行,有时不能。你知道吗

长石-版本:2.2.1你知道吗

克框架:0.6你知道吗

错误:

19/06/05 02:22:17 ERROR Executor: Exception in task 641.3 in stage 216.0 (TID 123244)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
  File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/worker.py", line 166, in main
   func, profiler, deserializer, serializer = read_command(pickleSer, infile)
  File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/worker.py", line 55, in read_command
    command = serializer._read_with_length(file)
  File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/serializers.py", line 169, in _read_with_length
    return self.loads(obj)
  File "/appcom/spark-2.2.1/python/lib/pyspark.zip/pyspark/serializers.py", line 455, in loads
    return pickle.loads(obj, encoding=encoding)
  File "/data/data08/nm-local-dir/usercache/hduser0011/appcache/application_1547810698423_82435/container_1547810698423_82435_02_000041/ares_detect.zip/ares_detect/task/communication_detect.py", line 11, in <module>
    from graphframes import GraphFrame
ModuleNotFoundError: No module named 'graphframes'

命令:

spark-submit --master yarn-cluster \
        --name ad_com_detect_${app_arr[$i]}_${scenario_arr[$i]}_${txParameter_app_arr[$i]} \
        --executor-cores 4 \
        --num-executors 8 \
        --executor-memory 35g \
        --driver-memory 2g \
        --conf spark.sql.shuffle.partitions=800 \
        --conf spark.default.parallelism=1000 \
        --conf spark.yarn.executor.memoryOverhead=2048 \
        --conf spark.sql.execution.arrow.enabled=true \
        --jars org.scala-lang_scala-reflect-2.10.4.jar,\
org.slf4j_slf4j-api-1.7.7.jar,\
com.typesafe.scala-logging_scala-logging-api_2.10-2.1.2.jar,\
com.typesafe.scala-logging_scala-logging-slf4j_2.10-2.1.2.jar,\
graphframes-0.6.0-spark2.2-s_2.11.jar \
        --py-files ***.zip \
***/***/****.py  &

pyspark在内存耗尽时会删除这些jar吗?你知道吗


Tags: inpyreadlibconflinezipspark
1条回答
网友
1楼 · 发布于 2024-09-30 03:23:02

尝试通过package命令添加jar。你知道吗

spark-submit \
     packages graphframes:graphframes:0.7.0-spark2.4-s_2.11  \
      my_py_script.py

它还可以同时处理这两个参数

spark-submit \
     packages graphframes:graphframes:0.7.0-spark2.4-s_2.11  \
     jars patth_to_your_jars/graphframes-0.7.0-spark2.4-s_2.11.jar \
    my_py_script.py

这为我解决了问题

一般来说,有4个命令可以添加文件来激发火花,这些命令在 spark-submit help

 jars JARS            Comma-separated list of jars to include on the driver and executor classpaths.

 packages             Comma-separated list of maven coordinates of jars to include on the driver and executor classpaths. Will search the local maven repo, then maven central and any additional remote repositories given by  repositories.

 py-files PY_FILES    Comma-separated list of .zip, .egg, or .pyfiles to place on the PYTHONPATH for Python apps.

 files FILES          Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).

相关问题 更多 >

    热门问题