Python=dask Vs pandas,读取错误

2024-09-27 09:36:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我在用dask读取一个文件时出错,该文件适用于pandas:

import dask.dataframe as dd
import pandas as pd
pdf = pd.read_csv("./tous_les_docs.csv")
pdf.shape
(20140796, 7)

当达斯克给我一个错误:

^{pr2}$

回答: 添加“blocksize=None”使其有效:

df = dd.read_csv("./tous_les_docs.csv", blocksize=None)

Tags: 文件csvimportnonedocspandasreadpdf
1条回答
网友
1楼 · 发布于 2024-09-27 09:36:57

文件上说这可能发生

It should also be noted that this function may fail if a CSV file includes quoted strings that contain the line terminator. To get around this you can specify blocksize=None to not split files into multiple partitions, at the cost of reduced parallelism.

http://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv

Dask似乎通过行结束符将文件分块,但没有从一开始扫描整个文件,以查看行结束符是否在字符串中。在

相关问题 更多 >

    热门问题