Dask concatenate导致内存错误，而pandas concatenate不针对相同的文件

combA = np.load(file2A.format(0) , allow_pickle=True) combB = np.load(file2B.format(0), allow_pickle=True ) combC = np.load(file2C.format(0), allow_pickle=True ) combD = np.load(file2D.format(0) , allow_pickle=True) combE = np.load(file2E.format(0) , allow_pickle=True ) combF = np.load(file2F.format(0), allow_pickle=True ) dfAllA = dd.from_pandas(pd.DataFrame(combA), npartitions=10) dfAllB = dd.from_pandas(pd.DataFrame(combB), npartitions=10) dfAllC = dd.from_pandas(pd.DataFrame(combC), npartitions=10) dfAllD = dd.from_pandas(pd.DataFrame(combD), npartitions=10) dfAllE = dd.from_pandas(pd.DataFrame(combE), npartitions=10) dfAllF = dd.from_pandas(pd.DataFrame(combF), npartitions=10) dfAllT = dd.concat([dfAllA, dfAllB, dfAllC, dfAllD, dfAllE, dfAllF ], interleave_partitions=True)

1条回答

网友

1楼 · 发布于 2024-05-20 13:43:18

您当前正在将所有数据加载到RAM中，然后将其交给Dask。如果您的所有数据在第一次启动时就已经充满了RAM，那么Dask并不能为您提供太多帮助。在

相反，最好告诉Dask如何加载数据，并让它在正确的时间进行加载。本文档可能会为您指出正确的方向：https://docs.dask.org/en/latest/delayed-collections.html这里是一个旧示例https://gist.github.com/mrocklin/e7b7b3a65f2835cda813096332ec73ca

相关问题更多 >

编程相关推荐

热门问题

热门文章