java找不到值类“org”的序列化程序。阿帕奇。hadoop。hbase。客户结果'
我试图从HBase中读取数据并将其保存为sequenceFile,但是
java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
错误
我看到了两个类似的帖子:
hadoop writables NotSerializableException with Apache Spark API
及
Spark HBase Join Error: object not serializable class: org.apache.hadoop.hbase.client.Result
在这两篇文章之后,我注册了三个Kyro类,但仍然没有运气
以下是我的节目:
String tableName = "validatorTableSample";
System.out.println("Start indexing hbase: " + tableName);
SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
Class[] classes = {org.apache.hadoop.io.LongWritable.class, org.apache.hadoop.io.Text.class, org.apache.hadoop.hbase.client.Result.class};
sparkConf.registerKryoClasses(classes);
JavaSparkContext sc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.set(TableInputFormat.INPUT_TABLE, tableName);
// conf.setStrings("io.serializations",
// conf.get("io.serializations"),
// MutationSerialization.class.getName(),
// ResultSerialization.class.getName());
conf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
JavaPairRDD<ImmutableBytesWritable, Result> hBasePairRDD = sc.newAPIHadoopRDD(
conf,
TableInputFormat.class,
ImmutableBytesWritable.class,
Result.class);
hBasePairRDD.saveAsNewAPIHadoopFile("/tmp/tempOutputPath", ImmutableBytesWritable.class, Result.class, SequenceFileOutputFormat.class);
System.out.println("Finished readFromHbaseAndSaveAsSequenceFile() .........");
以下是stacktrace的错误:
java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1254)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1156)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/05/25 10:58:38 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.io.IOException: Could not find a serializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're usingcustom serialization.
at org.apache.hadoop.io.SequenceFile$Writer.init(SequenceFile.java:1254)
at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:1156)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:530)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getSequenceWriter(SequenceFileOutputFormat.java:64)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat.getRecordWriter(SequenceFileOutputFormat.java:75)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1112)
at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1$$anonfun$12.apply(PairRDDFunctions.scala:1095)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:70)
at org.apache.spark.scheduler.Task.run(Task.scala:86)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
17/05/25 10:58:38 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
# 1 楼答案
# 2 楼答案
以下是使其发挥作用所需要的
因为我们使用HBase存储数据,而这个reducer将其结果输出到HBase表,所以Hadoop告诉我们他不知道如何序列化数据。这就是为什么我们需要帮助它。内部设置设置io。序列化变量