我有一个很大的csv文件,假设它看起来像这样
ID,PostCode,Value
H1A0A1-00,H1A0A1,0
H1A0A1-01,H1A0A1,0
H1A0A1-02,H1A0A1,0
H1A0A1-03,H1A0A1,0
H1A0A1-04,H1A0A1,1
H1A0A1-05,H1A0A1,0
H1A1G7-0,H1A1G7,0
H1A1G7-1,H1A1G7,0
H1A1G7-2,H1A1G7,0
H1A1N6-00,H1A1N6,0
H1A1N6-01,H1A1N6,0
H1A1N6-02,H1A1N6,0
H1A1N6-03,H1A1N6,0
H1A1N6-04,H1A1N6,0
H1A1N6-05,H1A1N6,0
...
我想按邮政编码值将其拆分,并将所有具有相同邮政编码的行保存为CSV。我试过了
postals = data['PostCode'].unique()
for p in postals:
df = data[data['PostCode'] == p]
df.to_csv(directory + '/output/demographics/' + p + '.csv', header=False, index=False)
有没有一种方法可以使用Dask来利用多处理? 谢谢
如果你想把钱存到拼花地板上,那很容易
拼花地板
这会将每个邮政编码的数据保存在名为
PostCode=xxxxxxx
的文件夹中,该文件夹包含的文件数与dask.dataframe的分区数相同CSV
这里我建议您使用一个自定义函数
write_file
您应该检查它在性能方面的工作方式,并最终使用scheduler
相关问题 更多 >
编程相关推荐