Dask使用右元应用抛出错误

2024-10-02 12:28:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我开始使用dask,但遇到了一些对我来说毫无意义的错误

我正在尝试使用以下代码:

import dask.dataframe as dd

testpd = pd.DataFrame(
    {
        "SKU_ID": {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
        "STR_ID": {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 64, 6: 64},
        "DATE": {
            0: Timestamp("2018-01-01 00:00:00"),
            1: Timestamp("2018-01-02 00:00:00"),
            2: Timestamp("2018-01-03 00:00:00"),
            3: Timestamp("2018-01-04 00:00:00"),
            4: Timestamp("2018-01-05 00:00:00"),
            5: Timestamp("2020-02-22 00:00:00"),
            6: Timestamp("2020-02-23 00:00:00"),
        },
        "ORD_UNITS": {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0},
    }
)

testdd = dd.from_pandas(testpd, npartitions=2,)

def func(x):
    return pd.Series(x["DATE"] == x["DATE"].min(), name="result")

使用熊猫,效果完美:testpd.groupby(["SKU_ID", "STR_ID"]).apply(func)

但有了达斯克,我得到了:

testdd.groupby(["SKU_ID", "STR_ID"]).apply(
    func, meta=pd.Series([], dtype=bool, name="result")
).compute()

AttributeError: 'Series' object has no attribute 'columns'


Tags: nameiddateresulttimestampdddaskseries

热门问题