使用Pandas（Python）读取西班牙语的SPSS文件时出错

df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav') File "C:\Users\bonif\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\spss.py", line 44, in read_spss df, _ = pyreadstat.read_sav( File "pyreadstat\pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav File "pyreadstat\_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion File "pyreadstat\_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser File "pyreadstat\_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

2条回答

网友

1楼 · 编辑于 2024-09-27 07:29:03

Pandas调用pyreadstat读取SPSS文件src

直接使用它可能会更幸运，因为它有一个设置编码的选项

从文件https://github.com/Roche/pyreadstat#other-options

You can set the encoding of the original file manually. The encoding must be a iconv-compatible encoding. This is absolutely necessary if you are handling old xport files with non-ascii characters. Those files do not have stamped the encoding in the file itself, therefore the encoding must be set manually.

import pyreadstat
df, meta = pyreadstat.read_sav(path, encoding=my_encoding)

也可能是您根本没有安装iconv（它依赖于iconv进行编码），但我对此表示怀疑（您可能会遇到其他错误）

网友

2楼 · 编辑于 2024-09-27 07:29:03

正如ti7所建议的，使用pyreadstat，您需要指定编码，在本例中，latin1将完成以下操作：

>>> import pyreadstat
# This raises an error
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
  File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

# This is fine
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav", encoding="latin1")
>>>

相关问题更多 >

编程相关推荐

热门问题

热门文章