期望输出-[((1,2)、(3,4)、5)]
rdd = sc.parallelize([1,2,3,4,5])
rdd.map(lambda x: ((x[0],x[1]),(x[2],x[3]),x[4])).collect()
但是,我得到了一个错误--
TypeError: 'int' object is not subscriptable
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.handlePythonException(PythonRunner.scala:456)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:592)
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRunner.scala:575)
at org.apache.spark.api.python.BasePythonRunner$ReaderIterator.hasNext(PythonRunner.scala:410)
at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
请更正代码。我正在使用Python和Spark
根据穆罕默德·阿里·贾马维的评论:
“如果希望每行有一个列表,请在构造rdd时传递列表,如rdd=sc.parallelize([[1,2,3,4,5]])”
相关问题 更多 >
编程相关推荐