有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java如何使用配置单元支持创建SparkSession(因“未找到配置单元类”而失败)?

尝试运行以下代码时出错:

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

public class App {
  public static void main(String[] args) throws Exception {
    SparkSession
      .builder()
      .enableHiveSupport()
      .getOrCreate();        
  }
}

输出:

Exception in thread "main" java.lang.IllegalArgumentException: Unable to instantiate SparkSession with Hive support because Hive classes are not found.
    at org.apache.spark.sql.SparkSession$Builder.enableHiveSupport(SparkSession.scala:778)
    at com.training.hivetest.App.main(App.java:21)

如何解决?


共 (6) 个答案

  1. # 2 楼答案

    tl;dr您必须确保Spark SQL的spark-hive依赖项和所有可传递依赖项在运行时在Spark SQL应用程序的类路径上可用(而不仅仅是编译所需的构建时间)


    换句话说,Spark应用程序的类路径上必须有org.apache.spark.sql.hive.HiveSessionStateBuilderorg.apache.hadoop.hive.conf.HiveConf类(这与sbt或maven关系不大)

    前一个HiveSessionStateBuilderspark-hive依赖项(包括所有可传递依赖项)的一部分

    后一个HiveConfhive-exec依赖项的一部分(即上述spark-hive依赖项的传递依赖项)

  2. # 3 楼答案

    我查看了源代码,发现尽管HiveSessionState(在spark hive中),启动SparkSession还需要另一个类HiveConf。而且,HiveConf不包含在spark hive*jar中,也许您可以在与hive相关的jar中找到它,并将其放在类路径中

  3. # 4 楼答案

    我也有同样的问题。我可以通过添加以下依赖项来解决它。(我通过提及compile dependencies section of spark-hive_2.11 mvn repository page解决了该清单):

     <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_${scala.binary.version}</artifactId>
                <version>${spark.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.calcite</groupId>
                <artifactId>calcite-avatica</artifactId>
                <version>1.6.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.calcite</groupId>
                <artifactId>calcite-core</artifactId>
                <version>1.12.0</version>
            </dependency>
            <dependency>
                <groupId>org.spark-project.hive</groupId>
                <artifactId>hive-exec</artifactId>
                <version>1.2.1.spark2</version>
            </dependency>
            <dependency>
                <groupId>org.spark-project.hive</groupId>
                <artifactId>hive-metastore</artifactId>
                <version>1.2.1.spark2</version>
            </dependency>
            <dependency>
                <groupId>org.codehaus.jackson</groupId>
                <artifactId>jackson-mapper-asl</artifactId>
                <version>1.9.13</version>
            </dependency>
    

    斯卡拉在哪里。二进制的版本=2.11和spark。版本=2.1.0

     <properties>
          <scala.binary.version>2.11</scala.binary.version>
          <spark.version>2.1.0</spark.version>
        </properties>
    
  4. # 5 楼答案

    将以下依赖项添加到maven项目中

    <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_2.11</artifactId>
            <version>2.0.0</version>
    </dependency>
    
  5. # 6 楼答案

    虽然所有最重要的答案都是正确的,但您仍然面临问题,请记住,即使您在pom中提到了JAR,问题中描述的错误仍然可能发生

    为了解决此问题,请确保所有依赖项的版本应相同,并且作为标准做法,为spark version和scala version维护一个全局变量,并替换这些值以避免由于不同版本而产生任何冲突

    仅供参考:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>com.xxx.rehi</groupId>
        <artifactId>Maven9211</artifactId>
        <version>1.0-SNAPSHOT</version>
    <properties>
        <scala.version>2.12</scala.version>
        <spark.version>2.4.4</spark.version>
    </properties>
    
    
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-hive -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    
    
    </dependencies>
    </project>