我试图从一个简单的Pandas数据帧构建一个Spark数据帧。这是我遵循的步骤。在
import pandas as pd
pandas_df = pd.DataFrame({"Letters":["X", "Y", "Z"]})
spark_df = sqlContext.createDataFrame(pandas_df)
spark_df.printSchema()
直到现在一切都好。输出为:
root
|-- Letters: string (nullable = true)
当我试图打印数据帧时,问题出现了:
^{pr2}$结果是:
An error occurred while calling o158.collectToPython. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.SparkException:
Error from python worker:
Error executing Jupyter command 'pyspark.daemon': [Errno 2] No such file or directory PYTHONPATH was:
/home/roldanx/soft/spark-2.4.0-bin-hadoop2.7/python/lib/pyspark.zip:/home/roldanx/soft/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip:/home/roldanx/soft/spark-2.4.0-bin-hadoop2.7/jars/spark-core_2.11-2.4.0.jar:/home/roldanx/soft/spark-2.4.0-bin-hadoop2.7/python/lib/py4j-0.10.7-src.zip:/home/roldanx/soft/spark-2.4.0-bin-hadoop2.7/python/: org.apache.spark.SparkException: No port number in pyspark.daemon's stdout
以下是我的Spark规格:
SparkSession-蜂巢
SparkContext
Spark用户界面
版本: v2.4.0版
主人: 本地[*]
应用程序名: PySparkShell公司
这是我的价值:
export PYSPARK_PYTHON=jupyter
export PYSPARK_DRIVER_PYTHON_OPTS='lab'
事实:
正如错误所提到的,它与从Jupyter运行pyspark有关。用“PYSPARK_PYTHON=python2.7”和“PYSPARK_PYTHON=python3.6”运行它很好
导入并初始化findspark,创建一个spark会话,然后使用该对象将pandas数据帧转换为spark数据帧。然后将新的spark数据帧添加到目录中。使用python3.6.6在Jupiter 5.7.2和Spyder 3.3.2中测试并运行。在
相关问题 更多 >
编程相关推荐