Pyspark主题建模任务失败;无法解释错误日志

2024-06-26 04:18:23 发布

您现在位置:Python中文网/ 问答频道 /正文

下面有几行代码。我会包括更多,但我怀疑错误是由于我的环境,而不是代码。下面的this tutorial几乎是一行接一行,除了我使用了不同的数据和不同版本的Spark

def topic_render(topic, vocabArray):
    terms = topic[0]
    result = []
    for i in range(0, 5):
        term = vocabArray[terms[i]]
        result.append(term)
    return result

lda_model = LDA.train(result_tfidf[['index','features']]
            .rdd.mapValues(Vectors.fromML)
            .map(list), k=10, maxIterations=100)
topicIndices = spark.sparkContext.parallelize(lda_model.describeTopics(maxTermsPerTopic = 5))
#The above line passes
topics_final = topicIndices.map(lambda topic: topic_render(topic, vocabArray)).collect()
#Crashes on this line; error log incomprehensible

下面是几行日志输出(非常长,大部分只是重复这一部分)。很难理解哪里出了问题-我不认为我需要hadoop二进制路径中的winutils二进制文件,或者本机hadoop库之类的东西,因为我每次在Spark中做任何事情时都会看到这些错误,而这以前从未引起过问题

19/11/03 16:21:14 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/11/03 16:21:14 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:15 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/03 16:21:21 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:22 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:23 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
[Stage 0:>                                                         (0 + 4) / 56]19/11/03 16:21:32 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
        at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
        at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
        at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
        at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
        at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:611)
        at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
        at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
        at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
        at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
        at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
        at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
        at scala.Option.getOrElse(Option.scala:121)
        at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
        at org.apache.spark.SecurityManager.<init>(SecurityManager.scala:79)
        at org.apache.spark.deploy.SparkSubmit.secMgr$lzycompute$1(SparkSubmit.scala:348)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$secMgr$1(SparkSubmit.scala:348)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
        at org.apache.spark.deploy.SparkSubmit$$anonfun$prepareSubmitEnvironment$7.apply(SparkSubmit.scala:356)
        at scala.Option.map(Option.scala:146)
        at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:355)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:774)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
19/11/03 16:21:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:33 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:39 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
[Stage 0:>                                                         (0 + 4) / 56]19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4045. Attempting port 4046.
19/11/03 16:21:47 WARN Utils: Service 'SparkUI' could not bind on port 4046. Attempting port 4047.
[Stage 0:>                                                         (0 + 4) / 56]

Tags: orgbindonportapacheservicenotutils