火花槽流漏包?

2024-10-07 00:26:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我尝试执行flume流式处理的示例,但无法使我的jars文件正常工作: 在这里 https://github.com/spark-packages/dstream-flume/blob/master/examples/src/main/python/streaming/flume_wordcount.py 他们指出

bin/spark-submit --jars \
      external/flume-assembly/target/scala-*/spark-streaming-flume-assembly-*.jar 

我不知道这个“外部”目录是什么?在

在我的spark(1.6.0)lib中,我放了几个jar(我尝试了1.6.0和1.6.0):

^{pr2}$

然后我做一个:

    $ ./bin/pyspark --master ip:7077 --total-executor-cores 1 --packages com.databricks:spark-csv_2.10:1.4.0 
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-sink_2.10-1.6.0.jar 
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume_2.10-1.6.0.jar 
--jars /Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/spark-streaming-flume-assembly_2.10-1.6.0.jar 

python笔记本服务器启动,但是当我请求一个storm对象时:

from pyspark.streaming.flume import FlumeUtils
from pyspark           import SparkContext
from pyspark           import SparkConf
from pyspark.streaming import StreamingContext
try    : sc.stop()
except : pass
try    : ssc.stop()
except : pass
conf = SparkConf()
conf.setAppName("Streaming Flume")
conf.set("spark.executor.memory","1g")
conf.set("spark.driver.memory","1g")
conf.set("spark.cores.max","5")
conf.set("spark.driver.extraClassPath", "/Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/")
conf.set("spark.executor.extraClassPath", "/Users/romain/Informatique/zoo/spark-1.6.0-bin-hadoop2.4/lib/")
sc  = SparkContext(conf=conf)
ssc = StreamingContext(sc, 10)
FlumeUtils.createStream(ssc, "localhost", "4949")

它失败了:

________________________________________________________________________________________________

  Spark Streaming's Flume libraries not found in class path. Try one of the following.

  1. Include the Flume library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-flume:1.6.0 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-flume-assembly, Version = 1.6.0.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-flume-assembly.jar> ...

________________________________________________________________________________________________

我试着补充

--packages org.apache.spark:spark-streaming-flume-sink.1.6.0

在我的spark提交的最后,我得到了另一个问题:

org.apache.spark#spark-streaming-flume-sink added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 2344ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.apache.spark#spark-streaming-flume-sink;1.6.0

    ==== local-m2-cache: tried

      file:/Users/romain/.m2/repository/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.pom

      -- artifact org.apache.spark#spark-streaming-flume-sink;1.6.0!spark-streaming-flume-sink.jar:

      file:/Users/romain/.m2/repository/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.jar

    ==== local-ivy-cache: tried

      /Users/romain/.ivy2/local/org.apache.spark/spark-streaming-flume-sink/1.6.0/ivys/ivy.xml

    ==== central: tried

      https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.pom

      -- artifact org.apache.spark#spark-streaming-flume-sink;1.6.0!spark-streaming-flume-sink.jar:

      https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.pom

      -- artifact org.apache.spark#spark-streaming-flume-sink;1.6.0!spark-streaming-flume-sink.jar:

      http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-flume-sink/1.6.0/spark-streaming-flume-sink-1.6.0.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.apache.spark#spark-streaming-flume-sink;1.6.0: not found

        ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.spark#spark-streaming-flume-sink;1.6.0: not found]

我从没用过pom.xml文件-也许我应该?在


Tags: theorgbinpackagesapacheconfusersspark