我正在读一个1gb的CSV文件,分为10000行。该文件有1106012行171列,其他较小的文件没有显示任何错误,并成功结束,但当我读取这个1GB文件时,它每次都显示错误,正好在第二行1106011,即文件的最后一行,我可以手动删除该行,但这不是解决方案,因为我有数百个相同大小的其他文件,我无法手动修复所有行。有人能帮我吗。在
def extract_csv_to_sql(input_file_name, header_row, size_of_chunk, eachRow):
df = pd.read_csv(input_file_name,
header=None,
nrows=size_of_chunk,
skiprows=eachRow,
low_memory=False,
error_bad_lines=False,
sep=',')
# engine='python'
# quoting=csv.QUOTE_NONE
# encoding='utf-8'
df.columns = header_row
df = df.drop_duplicates(keep='first')
df = df.apply(lambda x: x.astype(str).str.lower())
return df
然后我在一个循环中调用这个函数,运行得很好。在
^{pr2}$我读了这个Pandas ParserError EOF character when reading multiple csv files to HDF5,这个read_csv() & EOF character in string cause parsing issue,这个https://github.com/pandas-dev/pandas/issues/11654等等,并试图包含read_cv参数,比如
engine='python'
quoting=csv.QUOTE_NONE // Hangs and even the python shell, don't know why
encoding='utf-8'
但是没有一个有效,它仍然抛出以下错误
错误:
Traceback (most recent call last):
File "C:\Users\WCan\Desktop\wcan_new_python\pandas_test_3.py", line 115, in <module>
huge_chunk_return = extract_csv_to_sql(huge_input_filename, huge_header_row, the_size_of_chunk_H, each_Row_H)
File "C:\Users\WCan\Desktop\wcan_new_python\pandas_test_3.py", line 24, in extract_csv_to_sql
sep=',')
File "C:\Users\WCan\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\io\parsers.py", line 655, in parser_f
return _read(filepath_or_buffer, kwds)
File "C:\Users\WCan\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\io\parsers.py", line 411, in _read
data = parser.read(nrows)
File "C:\Users\WCan\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\io\parsers.py", line 1005, in read
ret = self._engine.read(nrows)
File "C:\Users\WCan\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\io\parsers.py", line 1748, in read
data = self._reader.read(nrows)
File "pandas\_libs\parsers.pyx", line 893, in pandas._libs.parsers.TextReader.read (pandas\_libs\parsers.c:10885)
File "pandas\_libs\parsers.pyx", line 966, in pandas._libs.parsers.TextReader._read_rows (pandas\_libs\parsers.c:11884)
File "pandas\_libs\parsers.pyx", line 953, in pandas._libs.parsers.TextReader._tokenize_rows (pandas\_libs\parsers.c:11755)
File "pandas\_libs\parsers.pyx", line 2184, in pandas._libs.parsers.raise_parser_error (pandas\_libs\parsers.c:28765)
pandas.errors.ParserError: Error tokenizing data. C error: EOF inside string starting at line 1106011
>>>
如果您在linux下,请尝试删除所有不可打印的字符。 尝试在此操作后加载文件。在
相关问题 更多 >
编程相关推荐