在pyspark中使用嵌套元素从RDD获取平面RDD

a = sc.parallelize(( (1,2), 3,(4,5))) b = sc.parallelize((2,(4,6,7),8)) def maxReduce(tup): return int(functools.reduce(lambda a,b : a if a>b else b, tup)) maxFunc = lambda x: maxReduce(x) if type(x) == tuple else x a.union(b).map(lambda x: maxFunc(x)).reduce(lambda a,b : a if a>b else b)

1条回答

网友

1楼 · 发布于 2024-10-01 02:24:51

听起来是递归函数的一个很好的用例：

from collections import Iterable

a = sc.parallelize(((1, 2), 3, (4, (6, 7, (8, 9, (11), 10)), 5, 12)))
b = sc.parallelize((1, 2, (3, 4)))

def maxIterOrNum(ele):
    """
    this method finds the maximum value in an iterable otherwise return the value itself
    :param ele: An iterable of numeric values or a numeric value
    :return: a numeric value
    """
    res = -float('inf')
    if isinstance(ele, Iterable):
        for x in ele:
            res = max(res, maxIterOrNum(x))
        return res
    else:
        return ele

a.union(b).reduce(lambda x, y: max(maxIterOrNum(x), maxIterOrNum(y)))

相关问题更多 >

编程相关推荐

热门问题

热门文章

在pyspark中使用嵌套元素从RDD获取平面RDD

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >