在pycharm中运行pyspark程序

2024-10-03 02:37:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在学习spark并坚持运行示例basic程序单词计数。请帮忙解决这个问题

我正在使用pycharm,我的操作系统是windows

这是我使用的代码

import os
import sys

# Path for folder containing winutils.exe . Without it I was getting the error java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
os.environ['HADOOP_HOME']="C:\\Users\\ekhaavi\\Documents\\ApacheSpark\\Hadoop"
# Path for spark source folder
os.environ['SPARK_HOME']="C:\\Users\ekhaavi\\Documents\\ApacheSpark\\spark-1.6.0-bin-hadoop2.6"
# append to PYTHONPATH so that pyspark could be found
sys.path.append("C:\\Users\\ekhaavi\\Documents\\ApacheSpark\\spark-1.6.0-bin-hadoop2.6\\python")
#this is to overcome the py4j exception
sys.path.append("C:\\Users\\ekhaavi\\Documents\\ApacheSpark\\spark-1.6.0-bin-hadoop2.6\\python\\lib\\py4j-0.9-src.zip")

# Now we are ready to import Spark Modules
try:
    from pyspark import SparkContext
    from pyspark import SparkConf

except ImportError as e:
    print ("Error importing Spark Modules", e)
    sys.exit(1)

if __name__ == "__main__":
    sc = SparkContext('local')
    words = sc.parallelize(["scala","java","hadoop","spark","akka"])
    print(words.count())

运行后,我得到以下例外情况

^{pr2}$

Tags: thetopathimportforbinossys
1条回答
网友
1楼 · 发布于 2024-10-03 02:37:25

为了克服这个问题,我们必须下载winutils.exe“发件人”

https://social.msdn.microsoft.com/forums/azure/en-US/28a57efb-082b-424b-8d9e-

731b1fe135de/如果遇到作业故障,请阅读?论坛=hdinsight”

下载文件后,将其放在目录/bin中,并将该目录定义到HADOOP_HOME下

windows环境变量

相关问题 更多 >