将PySpark数据帧保存到parquet文件会引发Py4JJavaE

2024-09-17 17:33:46 发布

您现在位置:Python中文网/ 问答频道 /正文

尝试保存PySpark数据帧时出现异常。在

下面是我的代码和一个玩具示例:

from pyspark.context import SparkContext
from pyspark.sql.session import SparkSession
sc = SparkContext('local')
spark = SparkSession(sc)
import pyspark
import pandas as pd

toy_df = '{"userId":{"0":1,"1":1,"10":1,"100":3,"1000":15,"10000":71,"10001":71,"10002":71,"10003":71,"10004":71},"movieId":{"0":31,"1":1029,"10":1371,"100":296,"1000":157,"10000":581,"10001":589,"10002":908,"10003":1171,"10004":1259},"rating":{"0":2.5,"1":3.0,"10":2.5,"100":4.5,"1000":2.0,"10000":4.0,"10001":3.0,"10002":5.0,"10003":5.0,"10004":4.0},"timestamp":{"0":1260748800000,"1":1260748800000,"10":1260748800000,"100":1298851200000,"1000":1052870400000,"10000":974592000000,"10001":974592000000,"10002":974592000000,"10003":974592000000,"10004":974592000000}}'
toy_df = pd.read_json(toy_df)

# Make the pandas dataframe a pyspark dataframe
toy = spark.createDataFrame(toy_df)

# Write the pyspark dataframe to disk
toy.write.save('toy', format='parquet', mode='append')

错误:

Py4JJavaError: An error occurred while calling o152.save.


Tags: thefromimportdataframepandasdfsavespark