我开始使用dask,但遇到了一些对我来说毫无意义的错误
我正在尝试使用以下代码:
import dask.dataframe as dd
testpd = pd.DataFrame(
{
"SKU_ID": {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 2, 6: 2},
"STR_ID": {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 64, 6: 64},
"DATE": {
0: Timestamp("2018-01-01 00:00:00"),
1: Timestamp("2018-01-02 00:00:00"),
2: Timestamp("2018-01-03 00:00:00"),
3: Timestamp("2018-01-04 00:00:00"),
4: Timestamp("2018-01-05 00:00:00"),
5: Timestamp("2020-02-22 00:00:00"),
6: Timestamp("2020-02-23 00:00:00"),
},
"ORD_UNITS": {0: 0, 1: 0, 2: 0, 3: 0, 4: 0, 5: 0, 6: 0},
}
)
testdd = dd.from_pandas(testpd, npartitions=2,)
def func(x):
return pd.Series(x["DATE"] == x["DATE"].min(), name="result")
使用熊猫,效果完美:testpd.groupby(["SKU_ID", "STR_ID"]).apply(func)
但有了达斯克,我得到了:
testdd.groupby(["SKU_ID", "STR_ID"]).apply(
func, meta=pd.Series([], dtype=bool, name="result")
).compute()
AttributeError: 'Series' object has no attribute 'columns'
目前没有回答
相关问题 更多 >
编程相关推荐