如何在pyspark中创建嵌套列表？

... 0544144,23,86,40.761650,29.940929 0544147,23,104,40.768749,29.968599 0545525,20,86,40.761650,29.940929 0538333,21,184,40.764679,29.929543 05477900,21,204,40.773071,29.975010 0561554,23,47,40.764694,29.927397 ...

1条回答

网友

1楼 · 发布于 2024-09-27 17:31:33

由于文本文件是csv格式，因此如果使用Spark 2.x，可以轻松地将其加载到数据帧中：

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, IntegerType, DoubleType

spark = SparkSession.builder.getOrCreate()

schema = StructType([
            StructField("tel", IntegerType(), True),
            StructField("time", IntegerType(), True),
            StructField("deltatime", IntegerType(), True),
            StructField("lat", DoubleType(), True),
            StructField("long", DoubleType(), True)
        ])

data = spark.read.csv("data2.txt", header=False, schema=schema)

然后您可以通过以下方式访问数据：

^{pr2}$

注意：在Spark中访问数据[1]没有任何意义，因为它是一个分布式系统。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在pyspark中创建嵌套列表？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >