如何读取Spark日志文件。Iz4或。snappy

2024-10-02 22:28:42 发布

您现在位置:Python中文网/ 问答频道 /正文

我想读一些日志,但我不能。到目前为止,我试过:

  • hadoop fs -text <file>

但我得到的唯一结果是:INFO compress.CodecPool: Got brand-new decompressor [.lz4](与.snapy相同)

  • val rawRdd = spark.sparkContext.sequenceFile[BytesWritable, String](<file>)

它返回我<file> is not a SequenceFile

  • val rawRdd = spark.read.textFile(<file>)

在这种情况下java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z

  • 将文件下载到本地文件系统,然后使用lz4 -d <file>解压缩并尝试查看内容

  • 我跟踪了this SO post:

with open (snappy_file, "r") as input_file: data = input_file.read() decompressor = snappy.hadoop_snappy.StreamDecompressor() uncompressed = decompressor.decompress(data)

但当我想print(uncompressed)时,我只能得到' 'b


Tags: textinfohadoopreadinputdatavalfs