<p>你的问题有很多问题:</p>
<p>首先,PySpark不是一个附加包,而是Spark本身的一个基本组件;换句话说,在安装Spark时,默认情况下也会得到PySpark(即使您愿意,也无法避免)。所以,第2步应该足够了(甚至在这之前,PySpark应该可以在您的机器中使用,因为您已经在使用Spark了)。</p>
<p>步骤1是不必要的:PyPi中的Pyspark(即与<code>pip</code>或<code>conda</code>一起安装)不包含完整的Pyspark功能;它只用于已经存在的集群中的Spark安装。从<a href="https://pypi.python.org/pypi/pyspark" rel="nofollow noreferrer">docs</a>:</p>
<blockquote>
<p>The Python packaging for Spark is not intended to replace all of the other use cases. This Python packaged version of Spark is suitable
for interacting with an existing cluster (be it Spark standalone,
YARN, or Mesos) - but does not contain the tools required to setup
your own standalone Spark cluster. You can download the full version
of Spark from the Apache Spark downloads page.</p>
<p><strong>NOTE</strong>: If you are using this with a Spark standalone cluster you must ensure that the version (including minor version) matches or you
may experience odd errors</p>
</blockquote>
<p>基于这样的事实,正如您所说,您已经在使用Spark(通过Scala),您的问题似乎与升级有关。现在,如果您使用预先构建的Spark发行版,实际上您没有什么要安装的——您只需下载、解压缩和设置相关的环境变量(<code>SPARK_HOME</code>等)——请参阅我对<a href="https://stackoverflow.com/questions/33887227/how-to-upgrade-spark-to-newer-version/33914992#33914992">"upgrading" Spark</a>的回答,这实际上也适用于首次“安装”。</p>