Python Pandas read_csv(encoding='utf16')仅与engine='Python'一起使用?

2024-10-03 17:21:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我在OSX El Capitan 10.11.2上的python2.7.10上使用Pandas 0.18.1,如果不设置engine='python',则无法读取带有read_csv()的UTF-16文件。在

文档指出Python解析器的功能更加完整,因此Pandas可能在默认情况下尝试使用C解析器,而且它还不支持UTF-16。有人能确认这是真的吗,还是这里发生了其他事情?在

以下是最小复制场景:

alanwagner : ~ ∴ pip2.7 freeze | grep pandas
pandas==0.18.1
alanwagner : ~ ∴ cat test.csv 
col1,col2
val1,val2
alanwagner : ~ ∴ python
Python 2.7.10 (default, Oct 23 2015, 18:05:06) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.read_csv('test.csv', encoding='utf8').to_csv('test-utf16.csv', encoding='utf16', index=False)
>>> 
alanwagner : ~ ∴ cat test-utf16.csv 
??col1,col2
val1,val2
alanwagner : ~ ∴ python
Python 2.7.10 (default, Oct 23 2015, 18:05:06) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.0 (clang-700.0.59.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import pandas as pd
>>> pd.read_csv('test-utf16.csv', encoding='utf16')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 562, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 315, in _read
    parser = TextFileReader(filepath_or_buffer, **kwds)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 645, in __init__
    self._make_engine(self.engine)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 799, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Library/Python/2.7/site-packages/pandas/io/parsers.py", line 1213, in __init__
    self._reader = _parser.TextReader(src, **kwds)
  File "pandas/parser.pyx", line 520, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:5129)
  File "pandas/parser.pyx", line 701, in pandas.parser.TextReader._get_header (pandas/parser.c:7665)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/encodings/utf_16.py", line 16, in decode
    return codecs.utf_16_decode(input, errors, True)
UnicodeDecodeError: 'utf16' codec can't decode byte 0x63 in position 2: truncated data
>>> pd.read_csv('test-utf16.csv', encoding='utf16', engine='python')
   col1  col2
0  val1  val2
>>> 

在将文件加载到Pandas数据帧之前,我将文件从UTF-16转换为UTF-8,从而解决了这个问题。在


Tags: csvinpytestselfparserpandasread
1条回答
网友
1楼 · 发布于 2024-10-03 17:21:05

是的,那是真的! 你可以试着

pd.read_csv('test-utf16.csv', encoding='utf-16')

我还没弄明白为什么会这样,但这应该可以让你在不设置引擎的情况下阅读它。在

相关问题 更多 >