擅长:python、mysql、java
<p>Hi要在Python中从RDD中选择特定列,请按如下方式操作</p>
<h2>样本数据(选项卡分开)</h2>
<p><a href="https://i.stack.imgur.com/yKxcI.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/yKxcI.png" alt="enter image description here"/></a></p>
<pre><code>from pyspark.conf import SparkConf
from pyspark.context import SparkContext
# creating spark context
conf = SparkConf().setAppName("SelectingColumn").setMaster("local[*]")
spark = SparkContext(conf = conf)
# calling data
raw_data = spark.textFile("C:\\Users...\\SampleCsv.txt", 1)
# custom method to return column b data only
def parse_data(line):
fields = line.split("\t")
# use 0 for column 1, 2 for column 2 and so on
return fields[1]
columnBdata = raw_data.map(parse_data)
print(columnBdata.take(4)) # yields column b data only
</code></pre>
<p><strong>输出['b'、'2'、'7'、'12']</strong></p>