有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java使用安装的Spark和maven将Spark Scala程序编译成jar文件

仍在努力熟悉maven,并将源代码编译成jar文件以供spark提交。我知道如何使用IntelliJ来实现这一点,但我想了解它的实际工作原理。我有一个EC2服务器,上面已经安装了所有最新的软件,比如spark和scala,还有SparkPi示例。我现在想用maven编译scala源代码。我愚蠢的问题是:首先,我是否可以使用我安装的软件来构建代码,而不是从maven存储库中检索依赖项,以及如何从基本pom开始。用于添加适当需求的xml模板。我不完全理解maven到底在做什么,我怎样才能测试源代码的编译? 据我所知,我只需要有标准的目录结构src/main/scala,然后想运行mvn package。另外,我想用maven而不是sbt进行测试


共 (2) 个答案

  1. # 1 楼答案

    除了@Krishna, 如果您有mvn project,请在pom.xml上使用mvn clean package。确保您的pom.xml中有以下build以生成fat-jar。(这是我的案例,我是如何制作罐子的)

    <build><sourceDirectory>src</sourceDirectory>
            <plugins><plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.0</version>
                <configuration>
                    <source>1.7</source>
                    <target>1.7</target>
                </configuration>
            </plugin>
                <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-assembly-plugin</artifactId>
                <version>2.4</version>
                <configuration>
                    <descriptorRefs>
                        <descriptorRef>jar-with-dependencies</descriptorRef>
                    </descriptorRefs>
                </configuration>
                <executions>
                    <execution>
                        <id>assemble-all</id>
                        <phase>package</phase>
                        <goals>
                            <goal>single</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin></plugins>
        </build>
    

    有关详细信息:link 如果您有sbt project,请使用sbt clean assembly使fat-jar。为此,您需要以下配置,例如build.sbt

    assemblyJarName := "WordCountSimple.jar"
    //
    val meta = """META.INF(.)*""".r
    
    assemblyMergeStrategy in assembly := {
      case PathList("javax", "servlet", xs@_*) => MergeStrategy.first
      case PathList(ps@_*) if ps.last endsWith ".html" => MergeStrategy.first
      case n if n.startsWith("reference.conf") => MergeStrategy.concat
      case n if n.endsWith(".conf") => MergeStrategy.concat
      case meta(_) => MergeStrategy.discard
      case x => MergeStrategy.first
    }
    

    也像plugin.sbt

    addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
    

    有关更多信息,请参见thisthis

    在此之前,主要目标是在目标文件夹中获得包含所有依赖项的胖jar。使用该jar在集群中运行,如下所示:

    hastimal@nm:/usr/local/spark$ ./bin/spark-submit  class  com.hastimal.wordcount  master yarn-cluster   num-executors 15  executor-memory 52g  executor-cores 7  driver-memory 52g   driver-cores 7  conf spark.default.parallelism=105  conf spark.driver.maxResultSize=4g  conf spark.network.timeout=300   conf spark.yarn.executor.memoryOverhead=4608  conf spark.yarn.driver.memoryOverhead=4608  conf spark.akka.frameSize=1200   conf spark.io.compression.codec=lz4  conf spark.rdd.compress=true  conf spark.broadcast.compress=true  conf spark.shuffle.spill.compress=true  conf spark.shuffle.compress=true  conf spark.shuffle.manager=sort /users/hastimal/wordcount.jar inputRDF/data_all.txt /output 
    

    这里有inputRDF/data_all.txt /output两个参数。同样从工具的角度来看,我在Intellij中构建IDE

  2. # 2 楼答案

    请按照以下步骤操作

    # create assembly jar upon code change
    sbt assembly
    
    # transfer the jar to a cluster 
    scp target/scala-2.10/myproject-version-assembly.jar <some location in your cluster>
    
    # fire spark-submit on your cluster
    $SPARK_HOME/bin/spark-submit  class not.memorable.package.applicaiton.class  master yarn  num-executor 10 \
       conf some.crazy.config=xyz  executor-memory=lotsG \
      myproject-version-assembly.jar \
      <glorious-application-arguments...>