Spark&python2.7复杂数据结构GroupByKey

def get_partition(x): j = [(x[1][i]).total_seconds() for i in range(len(x[1]))] return (x[0],j) newTimeDeltaRdd2 = newtimeDeltaRdd.map(lambda x : ((x[1].month,x[1].day), x[0])) totals = newTimeDeltaRdd2.map(lambda x: (get_partition(x))) totalsrdd = totals.groupByKey().map(lambda x : (x[0], list(x[1])))

2条回答

网友

1楼 · 编辑于 2024-09-27 23:28:30

快速和肮脏的解决方案，将给你描述的行为。你知道吗

我还是会考虑用字典

import numpy as np
for entry in totalsrdd:
    sum = np.zeros(36)
    for ls in entry[1]:
        sum = np.add(sum, ls)
    sum = np.divide(sum, len(entry[1]) * 36)
    entry[1] = sum

网友

2楼 · 编辑于 2024-09-27 23:28:30

下面是获得newrdd的可能解决方案：

totalsrdd = [((2, 16),[[1,2,3,...,36],[2,2,3,...,36]]),((2,17),[[1,2,3,...,36]]),...]

newrdd = []
for key, _list in totalsrdd:
    averages = []
    for i in range(36):
        averages.append(sum([_l[i] for _l in _list]) / 36 * len(_list))
    newrdd.append((key, averages))

相关问题更多 >

编程相关推荐

热门问题

热门文章

Spark&python2.7复杂数据结构GroupByKey

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >