我正在学习spark并坚持运行示例basic程序单词计数。请帮忙解决这个问题
我正在使用pycharm,我的操作系统是windows
这是我使用的代码
import os
import sys
# Path for folder containing winutils.exe . Without it I was getting the error java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
os.environ['HADOOP_HOME']="C:\\Users\\ekhaavi\\Documents\\ApacheSpark\\Hadoop"
# Path for spark source folder
os.environ['SPARK_HOME']="C:\\Users\ekhaavi\\Documents\\ApacheSpark\\spark-1.6.0-bin-hadoop2.6"
# append to PYTHONPATH so that pyspark could be found
sys.path.append("C:\\Users\\ekhaavi\\Documents\\ApacheSpark\\spark-1.6.0-bin-hadoop2.6\\python")
#this is to overcome the py4j exception
sys.path.append("C:\\Users\\ekhaavi\\Documents\\ApacheSpark\\spark-1.6.0-bin-hadoop2.6\\python\\lib\\py4j-0.9-src.zip")
# Now we are ready to import Spark Modules
try:
from pyspark import SparkContext
from pyspark import SparkConf
except ImportError as e:
print ("Error importing Spark Modules", e)
sys.exit(1)
if __name__ == "__main__":
sc = SparkContext('local')
words = sc.parallelize(["scala","java","hadoop","spark","akka"])
print(words.count())
运行后,我得到以下例外情况
^{pr2}$
为了克服这个问题,我们必须下载winutils.exe“发件人”
https://social.msdn.microsoft.com/forums/azure/en-US/28a57efb-082b-424b-8d9e-
731b1fe135de/如果遇到作业故障,请阅读?论坛=hdinsight”
下载文件后,将其放在目录/bin中,并将该目录定义到HADOOP_HOME下
windows环境变量
相关问题 更多 >
编程相关推荐