groupBy()。count.显示()给予java.lang.IllegalStateException异常pysp出错

2024-09-27 23:22:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在RDD上用groupBy()函数显示()结果。它的给予 出现以下错误:

Py4JJavaError: An error occurred while calling o14287.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 2669.0 failed 1 times, most recent failure: Lost task 3.0 in stage 2669.0 (TID 3896, localhost, executor driver): java.lang.IllegalStateException: Input row doesn't have expected number of values required by the schema. 26 fields are required while 1 values are provided.
    at org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$makeFromJava$15$$anonfun$apply$15.applyOrElse(EvaluatePython.scala:184)
    at org.apache.spark.sql.execution.python.EvaluatePython$.org$apache$spark$sql$execution$python$EvaluatePython$$nullSafeConvert(EvaluatePython.scala:208)
    at org.apache.spark.sql.execution.python.EvaluatePython$$anonfun$makeFromJava$15.apply(EvaluatePython.scala:180)

我的Pypark脚本

import pyspark
from pyspark.sql import SparkSession

spark=SparkSession.builder.getOrCreate()

s3RDD=spark.sparkContext.textFile("file:///Users/mydir/Documents/Projects/Pyspark/MiscScripts/logfile.gz")

firstLine = s3RDD.first()

sparkContext.parallelize convert string into RDD
parallelize = spark.sparkContext.parallelize([firstLine])

s3RDD=s3RDD.subtract(parallelize)
s3RDD=s3RDD.map(lambda x: x.split('\t'))

urlsDf=s3RDD.toDF()

#import pyspark.sql.functions as f

urlsDf.groupBy("_8").count().show() 

Tags: orgimportsqlapachestageatsparkpyspark
1条回答
网友
1楼 · 发布于 2024-09-27 23:22:57

这是我的文本文件:

版本:1.0

字段:日期时间x-edge-location sc bytes c-ip cs method cs(Host)cs uri stem sc status cs(Referer)cs(User Agent)cs uri query cs(Cookie)x-edge-result-type x-edge-request-id x-Host-header cs protocol cs bytes time takens x-forwarded-for ssl protocol ssl cipher x-response-result-type cs protocol version fle statusfle加密字段

2018-04-12 23:55:43 MAA50-C1 89352 39.44.14.521获取mycdn.com/mydir/new my dir url 200-Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520WOW64;%2520rv:40.0)%2520Gecko/20100101%2520Firefox/40.0 id=FOxPJf3rutG1qhi-Miss czeom7p2yw7byn5veotj8gs2gpdtwkxzdudijhfwbfpuscxk4j9a==mydomain.comhttp 370 3.169 10.130.24.151,%2010.140.65.140--未命中http/1.1-- 2018-04-12 23:55:51 MAA50-C1 81103 39.44.14.521获取mycdn.com/mydir/mydir-new-test1200-Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520WOW64;%2520rv:40.0)%2520Gecko/20100101%2520Firefox/40.0 id=QOP645KHxGQcgXW-Miss 1wKt5erjuDVQNa7X-D vKQeli3X1ZvE5g32D0H7vgLnq\u aiVuNqDA==mydomain.comhttp 349 1.245 10.130.24.151,%2010.140.65.140--未命中http/1.1-- 2018-04-12 23:55:59 MAA50-C1 0 39.44.14.521获取mycdn.com/mydir/mydir-new-test1000-Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520WOW64;%2520rv:40.0)%2520Gecko/20100101%2520Firefox/40.0 id=OCjtSXeh7QwqLtE-错误8c9OnlJYo\u 2jI6mBCMFNbtxv7NSV00NjjANS2r7ODqhAlkV3Ew-4aA==mydomain.comhttp 371 19.992 10.130.24.151,%2010.140.65.140--错误http/1.1-- 2018-04-12 23:55:45 BOM52 64704 103.18.142.29获取mycdn.com/mydir/mydir-new-test1 200-Mozilla/5.0%2520(Macintosh;%2520Intel%2520Mac%2520OS%2520X%252010\U 9\U 5)%2520AppleWebKit/537.36%2520(KHTML,%2520like%2520Gecko)%2520Chrome/42.0.2311.90%2520Safari/537.36--刷新UcaCxr82\U Wgm-VZETV0PXHCVOMAJO46JATYF8MBAZ0VPNGMFKGN-A== mydomain.comhttp 312 0.022----刷新HIT http/1.1-- 2018-04-12 23:56:38新2 71625 13.228.207.150获取mycdn.com/200-Mozilla/5.0%2520(Windows;%2520U;%2520Windows%2520NT%25206.0;%2520en-US;%2520rv:1.9.1.6)%2520Gecko/20091201%2520Firefox/3.5.6%2520GTB5--Miss 5fGTvqY4zU-2DWBMPEvOOtaskdX-ypiweu8rlr4fkfdrwletkyila==mydomain.comhttp 233 2.959----小姐HTTP/1.1-- 2018-04-12 23:55:41 MAA50-C1 67805 39.44.14.521获取mycdn.com/mydir/mydir-new-test1200-Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520WOW64;%2520rv:40.0)%2520Gecko/20100101%2520Firefox/40.0 id=eytpvato7qqq0qiw-Miss ZPtOvMKzHCvdS-HbAMsSTU5FfYzSmP8xnxM7KAHseJaZFMd6CykwwQ==mydomain.comhttp 338 1.828 10.130.24.151,%2010.140.65.140--未命中http/1.1-- 2018-04-12 23:55:52 MAA50-C1 62402 39.44.14.521获取mycdn.com/mydir/mydir-new-test1200-Mozilla/5.0%2520(Windows%2520NT%25206.1;%2520WOW64;%2520rv:40.0)%2520Gecko/20100101%2520Firefox/40.0 id=uGcBwdJhQC2V5sx-Miss 4ddtwo63b8obw5jq29idv5mdctjvvlq0r5pvbbv6ypqsnitxwsuaw==mydomain.comhttp 356 2.675 10.130.24.151,%2010.140.65.140--未命中http/1.1--

相关问题 更多 >

    热门问题