如何在RDD PYSPARK中计算每个特定行的总销售价格

2024-05-21 15:39:17 发布

男 | 程序猿一只，喜欢编程写python代码。

我有这样一个数据集：

1|goldenrod lavender spring chocolate lace|Manufacturer#1|Brand#13|PROMO BURNISHED COPPER|7|JUMBO PKG|901.00|ly. slyly ironi|
2|blush thistle blue yellow saddle|Manufacturer#1|Brand#13|LARGE BRUSHED BRASS|1|LG CASE|902.00|lar accounts amo|
3|spring green yellow purple cornsilk|Manufacturer#4|Brand#42|STANDARD POLISHED BRASS|21|WRAP CASE|903.00|egular deposits hag|
4|cornflower chocolate smoke green pink|Manufacturer#3|Brand#34|SMALL PLATED BRASS|14|MED DRUM|904.00|p furiously r|

我想计算每个品牌的总销售价格。例如品牌13（901,00+913,00=1814009）

这是我的密码：

    from operator import add
import operator
from pyspark.sql import SQLContext
from pyspark.sql import Window
import pyspark.sql.functions
from pyspark import SparkContext, SparkConf
import pyspark

conf = SparkConf().setAppName("part").setMaster("local[*]")
sc = SparkContext(conf = conf)

def Func(lines):
      
    lines = lines.split("|") 
    return  lines[2],lines[3]

def Funcc(lines):
      
    lines = lines.split("|") 
    return  lines[3],lines[7]



text = sc.textFile("part.tbl")
text1 = text.map(Func)
text2 = text.map(Funcc)

sort1 = text1.distinct().sortBy(lambda x:x[0], ascending=True).sortBy(lambda y:y[1], ascending = True)
sort2 = text2.sortBy(lambda x:x[0], ascending=True)

original_text = sort1.collect()
count_by_key = sort2.countByKey()
summe = sort2.reduceByKey(add).collect()


print("Manufacturer and Brands:")
for line in original_text:
    print(line)

print("Number of Items of each Brand")
print(count_by_key)
print(summe)

我不允许使用数据帧。。我试了一下：

summe = sort2.collect()
summe1 = sum(summe[1])

但代码不起作用：错误：

summe1=sum（summe[1]） TypeError:不支持+：“int”和“str”的操作数类型

Tags： lambda text from import sql conf pyspark lines

1条回答

网友

1楼 · 发布于 2024-05-21 15:39:17

我现在有了答案：你可以使用简单的函数reduceByKey:

preis = sort2.reduceByKey(lambda x,y: x+y).collect()

print("Total sales price of each Brand")
print(preis)

如何在RDD PYSPARK中计算每个特定行的总销售价格

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在RDD PYSPARK中计算每个特定行的总销售价格

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >