将Pandas数据帧转换为spark datafram时收到错误

2024-05-17 17:07:06 发布

您现在位置:Python中文网/ 问答频道 /正文

由于spark中不支持读取excel文件,所以我首先将excel文件读入pandas数据帧,然后尝试将pandas数据帧转换为spark数据帧,但我得到以下错误 (我使用的是spark 1.5.1)

import pandas as pd
from pandas import ExcelFile
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *

pdf=pd.read_excel('/home/testdata/test.xlsx')
df = sqlContext.createDataFrame(pdf)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/spark/spark-hadoop/python/pyspark/sql/context.py", line 406, in createDataFrame
    rdd, schema = self._createFromLocal(data, schema)
  File "/opt/spark/spark-hadoop/python/pyspark/sql/context.py", line 337, in _createFromLocal
    data = [schema.toInternal(row) for row in data]
  File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 541, in toInternal
    return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
  File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 541, in <genexpr>
    return tuple(f.toInternal(v) for f, v in zip(self.fields, obj))
  File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 435, in toInternal
    return self.dataType.toInternal(obj)
  File "/opt/spark/spark-hadoop/python/pyspark/sql/types.py", line 191, in toInternal
    else time.mktime(dt.timetuple()))
AttributeError: 'datetime.time' object has no attribute 'timetuple'

有人知道怎么修理吗?在


Tags: infrompyimportselfhadooppandassql
1条回答
网友
1楼 · 发布于 2024-05-17 17:07:06

我猜你的问题是关于当你用Pandas读取数据时“错误地”解析datetime数据

以下代码“正常工作”:

import pandas as pd
from pandas import ExcelFile
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql.types import *

pdf = pd.read_excel('test.xlsx', parse_dates=['Created on','Confirmation time'])

sc = SparkContext()
sqlContext = SQLContext(sc)

sqlContext.createDataFrame(data=pdf).collect()

[Row(Customer=1000935702, Country='TW',  ...

请注意,您还有一个datetime列'Confirmation date',在您的示例中,它由NaT组成,因此用您的短示例对RDD进行读取没有问题,但是如果您碰巧在完整的数据集中有一些数据,您也必须注意该列。在

相关问题 更多 >