擅长:python、mysql、java
<p>嗨,要从Python中的RDD中选择一个特定的列,请按如下所示操作</p>
<h2>样本数据(标签分开)</h2>
<p><a href="https://i.stack.imgur.com/yKxcI.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/yKxcI.png" alt="enter image description here"/></a></p>
<pre><code>from pyspark.conf import SparkConf
from pyspark.context import SparkContext
# creating spark context
conf = SparkConf().setAppName("SelectingColumn").setMaster("local[*]")
spark = SparkContext(conf = conf)
# calling data
raw_data = spark.textFile("C:\\Users...\\SampleCsv.txt", 1)
# custom method to return column b data only
def parse_data(line):
fields = line.split("\t")
# use 0 for column 1, 2 for column 2 and so on
return fields[1]
columnBdata = raw_data.map(parse_data)
print(columnBdata.take(4)) # yields column b data only
</code></pre>
<p><strong>输出['b','2','7','12']</strong></p>