sparksubmit和py文件

2024-09-30 02:18:43 发布

您现在位置:Python中文网/ 问答频道 /正文

全部,

我是大数据新手,我用Ubuntu v18:04在我的Dell Alienware桌面上安装了一个3节点的Apache Spark群集。我没有纱线设置,我将节点命名如下

  1. SparkMaster
  2. SparkWorker1 (Slave)
  3. SparkWorker2 (Slave)

我还在SparkMaster上安装了Anaconda,因为我想在Jupyter笔记本上工作-但我不确定是否需要Anaconda-我可以从pip3安装Jupyter吗?

下面的代码在Jupyter Notebook中以交互方式运行良好,我想看看当我将其作为作业提交时是否也能正常工作

from pyspark.sql import SQLContext
from pyspark.sql.types import *

sqlContext = SQLContext(sc)

df = sqlContext.read.load('/home/grajee/twitter/US_Politicians_Twitter.csv',
                      format='com.databricks.spark.csv',
                      header='true',
                      inferSchema='true')

df.write.csv('/home/grajee/twitter/US_Politicians_loaded.csv')

因此,我运行了spark命令“spark submit pytest.py”,它导致了以下错误

(base) grajee@SparkMaster:~/pyscript$ spark-submit pytest.py
20/11/27 16:43:41 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Traceback (most recent call last):
  File "/home/grajee/anaconda3/bin/jupyter", line 11, in <module>
    sys.exit(main())
  File "/home/grajee/anaconda3/lib/python3.8/site-packages/jupyter_core/command.py", line 247, in main
    command = _jupyter_abspath(subcommand)
  File "/home/grajee/anaconda3/lib/python3.8/site-packages/jupyter_core/command.py", line 133, in _jupyter_abspath
    raise Exception(
Exception: Jupyter command `jupyter-/home/grajee/pyscript/pytest.py` not found.
log4j:WARN No appenders could be found for logger (org.apache.spark.util.ShutdownHookManager).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

(base) grajee@SparkMaster:~/pyscript$ pwd
/home/grajee/pyscript

(base) grajee@SparkMaster:~/pyscript$ ls -l pytest.py
-rw-r--r-- 1 root root 374 Nov 27 14:58 pytest.py
(base) grajee@SparkMaster:~/pyscript$

我有几个问题:

[1]调用Jupyter notebook时,如何确保它是针对独立群集而不是本地群集运行的。当我错误地尝试启动另一个sparkcontext时,我得到了下面列出的错误。这似乎表明它没有在本地模式下运行,而我希望它在独立模式下运行

Cannot run multiple SparkContexts at once; existing SparkContext(app=PySparkShell, master=local[*]) created by at /home/grajee/anaconda3/lib/python3.8/site-packages/IPython/utils/py3compat.py:168

[2]为什么我会遇到例外情况

Exception: Jupyter command jupyter-/home/grajee/pyscript/pytest.py not found.

[3]卸载Anaconda并将其替换为“pip3安装jupyter”会更好吗


Tags: csvpyhomebasepytestjupytercommandpyscript

热门问题