将Python库添加到远程集群计算机？

from pyspark import SparkConf from pyspark import SparkContext def calculate(sc): text_file = sc.textFile("nevergonnagive.txt") counts = text_file.flatMap(lambda line: line.split(" ")) \ .map(lambda word: (word, 1)) \ .reduceByKey(lambda a, b: a + b) counts.saveAsTextFile("word_count_OUT") return sc sc = SparkContext.getOrCreate() conf_spark = SparkConf() conf_spark.set('spark.executorEnv.PYTHONPATH','~/local/lib/:/usr/bin/python3.6') conf_spark.set('spark.executorEnv.LD_LIBRARY_PATH','~/local/lib/python3.6: /some/path/Python/3.7.2/lib') import itertools import networkx as nx from networkx.algorithms.connectivity import local_edge_connectivity import random from shapely.geometry import Polygon from shapely.ops import cascaded_union import xml.etree.ElementTree as ET sc=calculate(sc) sc.close()

$ module show python/3.6.5 ------------------------------------------------------------------------------------------------ /some/path/modulefiles/python/3.6.5.lua: ------------------------------------------------------------------------------------------------ help([[Interpréteur Python Version disponible sous rh7 ]]) whatis("Nom : Python") whatis("Version : 3.6.5") whatis("Os : rh7") whatis("Date d installation : 14/08/2019") setenv("PYTHON_HOME","/some/path/Python/3.6.5") prepend_path("PATH","/some/path/Python/3.6.5/bin") prepend_path("LD_LIBRARY_PATH","/some/path/Python/3.6.5/lib") prepend_path("MANPATH","/some/path/Python/3.6.5/share/man") prepend_path("PKG_CONFIG_PATH","/some/path/Python/3.6.5/lib/pkgconfig") setenv("PIP_CERT","/some/path/certs/ca-bundle.crt")

1条回答

网友

1楼 · 发布于 2024-10-02 12:32:59

如果我理解正确，你只能访问spark cluster的一个节点。正确的方法是，在所有执行器上打开共享挂载，然后将您的venv复制到挂载中，并添加confpyspark.pyspark.python=/path/to/venv. 你知道吗

因为你没有预先的任务，所以你唯一可以做的方法（当你在集群上运行时，而不是在本地模式下）是从你的站点包文件夹（例如部门邮编)，而不是在提交作业时（通过spark submit）添加py文件。你知道吗

我不建议你这样做，因为我的实验使用pyfiles部门邮编如果有C编译的库（例如numpy、pymssql…），则无法正常工作。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章