java Spark FileNotFound在写入文件并随后读取时发生异常

1 年，3 月 Questions & Answers 484

我试着在一份工作中完成以下步骤 1）写一个新文件 2）在spark中将新创建的文件作为数据集读取

PrintWriter writer = null;
    try {
        writer = new PrintWriter("/tmp/fileName.txt", "UTF-8");
        writer.println("The,first,line");
        writer.println("The,second,line");
        writer.close();
    } catch (FileNotFoundException | UnsupportedEncodingException e) {
        e.printStackTrace();
    }
    Dataset<Row> data = sparkSession.sqlContext().read().format("com.databricks.spark.csv").load("file:///tmp/fileName.txt");
    data.show();

这是我遇到的问题

    Java.io.FileNotFoundException: File file:/tmp/fileName.txt does not exist
It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved.
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:157)
    at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:102)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:826)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:99)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:282)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

我想在单个作业中同时实现这两种操作，当我在

local mode

但最终失败了

standalone mode

Python中文网

有 Java 编程相关的问题?

java Spark FileNotFound在写入文件并随后读取时发生异常

共 (0) 个答案