我需要读取日志文件并将其转换为spark数据帧
输入文件内容:
dateCreated : 20200720
customerId : 001
dateCreated : 20200720
customerId : 002
dateCreated : 20200721
customerId : 003
预期数据帧:
---------------------------
|dateCreated | customerId |
---------------------------
|20200720 | 001 |
|20200720 | 002 |
|20200721 | 003 |
|------------|------------|
火花代码:
val spark = org.apache.spark.sql.SparkSession.builder.master("local").getOrCreate
val inputFile = "C:\\log_data.txt"
val rddFromFile = spark.sparkContext.textFile(inputFile)
val rdd = rddFromFile.map(f => {
f.split(":")
})
rdd.foreach(f => {
println(f(0) + "\t" + f(1))
})
如何将上述rdd转换为所需的DF
检查下面的代码
相关问题 更多 >
编程相关推荐