<p><strong>2018版</strong></p>
<p>在Windows 10上安装PYSPARK
带水蟒导航仪的JUPYTER笔记本电脑</p>
<h2>第1步</h2>
<p><strong>下载软件包</strong></p>
<p>1)spark-2.2.0-bin-hadoop2.7.tgz<a href="https://spark.apache.org/downloads.html" rel="nofollow noreferrer">Download</a></p>
<p>2)java jdk 8版本<a href="http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html" rel="nofollow noreferrer">Download</a></p>
<p>3)水蟒5.2<a href="https://www.anaconda.com/download/" rel="nofollow noreferrer">Download</a></p>
<p>4)scala-2.12.6.msi<a href="http://scala-2.12.6.msi" rel="nofollow noreferrer">Download</a></p>
<p>5)hadoop v2.7.1<a href="https://github.com/steveloughran/winutils" rel="nofollow noreferrer">Download</a></p>
<h2>第2步</h2>
<p>在<strong>C:/</strong>驱动器中创建SPARK文件夹,并将所有内容放入其中
<a href="https://i.stack.imgur.com/WsTrv.png" rel="nofollow noreferrer">It will look like this</a></p>
<p><strong>注意:在安装SCALA的过程中,在SPARK文件夹中给出SCALA的路径</strong></p>
<h2>第3步</h2>
<p>现在设置新的WINDOWS环境变量</p>
<ol>
<li><p><code>HADOOP_HOME=C:\spark\hadoop</code></p></li>
<li><p><code>JAVA_HOME=C:\Program Files\Java\jdk1.8.0_151</code></p></li>
<li><p><code>SCALA_HOME=C:\spark\scala\bin</code></p></li>
<li><p><code>SPARK_HOME=C:\spark\spark\bin</code></p></li>
<li><p><code>PYSPARK_PYTHON=C:\Users\user\Anaconda3\python.exe</code></p></li>
<li><p><code>PYSPARK_DRIVER_PYTHON=C:\Users\user\Anaconda3\Scripts\jupyter.exe</code></p></li>
<li><p><code>PYSPARK_DRIVER_PYTHON_OPTS=notebook</code></p></li>
<li><p><strong>现在选择火花路径</strong>:</p>
<p>单击“编辑并添加新内容”</p>
<p>将“<strong>C:\spark\spark\bin</strong>”添加到变量“Path”窗口中</p></li>
</ol>
<h2>第4步</h2>
<ul>
<li>创建一个文件夹,用于存储Jupyter笔记本的输出和文件</li>
<li>然后打开Anaconda命令提示符和<strong>cd文件夹名</strong></li>
<li>然后输入<strong>Pyspark</strong></li>
</ul>
<p>你的浏览器会弹出Juypter本地主机</p>
<h2>第5步</h2>
<p>检查pyspark是否正常工作!在</p>
<p>键入简单代码并运行它</p>
<pre><code>from pyspark.sql import Row
a = Row(name = 'Vinay' , age=22 , height=165)
print("a: ",a)
</code></pre>