Pandas Groupby Agg函数不Redu

def MakeList(x): """ This function is used to aggregate data that needs to be kept distinc within multi day observations for later use and transformation. It makes a list of the data and if the list is of length 1 then there is only one line/day observation in that group so the single element of the list is returned. If the list is longer than one then there are multiple line/day observations and the list itself is returned.""" L = x.tolist() if len(L) > 1: return L else: return L[0]

import pandas as pd DF = pd.DataFrame({'date': ['2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02', '2013-04-02'], 'line_code': ['401101', '401101', '401102', '401103', '401104', '401105', '401105', '401106', '401106', '401107'], 's.m.v.': [ 7.760, 25.564, 25.564, 9.550, 4.870, 7.760, 25.564, 5.282, 25.564, 5.282]}) DFGrouped = DF.groupby(['date', 'line_code'], as_index = False) DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

DF_Agg = DFGrouped.agg({'s.m.v.' : test_func}) Int64Index([0, 1], dtype='int64') Int64Index([2], dtype='int64') Int64Index([3], dtype='int64') Int64Index([4], dtype='int64') Int64Index([5, 6], dtype='int64') Int64Index([7, 8], dtype='int64') Int64Index([9], dtype='int64')

2条回答

网友

1楼 · 编辑于 2024-06-28 18:57:33

我真的不能解释你为什么，但是根据我的经验，list在pandas.DataFrame中并不能很好地工作。

我通常用tuple代替。这将起作用：

def MakeList(x):
    T = tuple(x)
    if len(T) > 1:
        return T
    else:
        return T[0]

DF_Agg = DFGrouped.agg({'s.m.v.' : MakeList})

     date line_code           s.m.v.
0  2013-04-02    401101   (7.76, 25.564)
1  2013-04-02    401102           25.564
2  2013-04-02    401103             9.55
3  2013-04-02    401104             4.87
4  2013-04-02    401105   (7.76, 25.564)
5  2013-04-02    401106  (5.282, 25.564)
6  2013-04-02    401107            5.282

网友

2楼 · 编辑于 2024-06-28 18:57:33

这是数据帧中的故障。如果聚合器返回第一个组的列表，它将失败，并出现您提到的错误；如果它返回第一个组的非列表（非系列），它将正常工作。损坏的代码位于groupby.py中：

def _aggregate_series_pure_python(self, obj, func):

    group_index, _, ngroups = self.group_info

    counts = np.zeros(ngroups, dtype=int)
    result = None

    splitter = get_splitter(obj, group_index, ngroups, axis=self.axis)

    for label, group in splitter:
        res = func(group)
        if result is None:
            if (isinstance(res, (Series, Index, np.ndarray)) or
                    isinstance(res, list)):
                raise ValueError('Function does not reduce')
            result = np.empty(ngroups, dtype='O')

        counts[label] = group.shape[0]
        result[label] = res

注意if result is None和isinstance(res, list。你的选择是：

伪造groupby（）.agg（），因此它看不到第一个组的列表，或者
自己进行聚合，使用上面那样的代码，但不要进行错误的测试。

相关问题更多 >

编程相关推荐

热门问题

热门文章