使用Pandas(Python)读取西班牙语的SPSS文件时出错

2024-09-27 07:29:03 发布

您现在位置:Python中文网/ 问答频道 /正文

早上好

我正在尝试使用Python中的SPSS文件(.sav)

这是我的代码:

import pandas as pd

df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')

df.head()

我得到这个错误:

df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')
  File "C:\Users\bonif\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\spss.py", line 44, in read_spss
    df, _ = pyreadstat.read_sav(
  File "pyreadstat\pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
  File "pyreadstat\_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat\_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat\_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

我发现这个错误可能是因为有些单词带有字母“ñ”,或者有些单词带有下面的字符“á”。我该如何解决这个问题

数据库在这个google驱动器中:https://drive.google.com/drive/folders/1P8v5NWE-GdAEJRZdmrp5KiL-DODClmfU?usp=sharing

多谢各位


Tags: inparserpandasdfreadlineusersfile
2条回答

Pandas调用pyreadstat读取SPSS文件src

直接使用它可能会更幸运,因为它有一个设置编码的选项

从文件https://github.com/Roche/pyreadstat#other-options

You can set the encoding of the original file manually. The encoding must be a iconv-compatible encoding. This is absolutely necessary if you are handling old xport files with non-ascii characters. Those files do not have stamped the encoding in the file itself, therefore the encoding must be set manually.

import pyreadstat
df, meta = pyreadstat.read_sav(path, encoding=my_encoding)

也可能是您根本没有安装iconv(它依赖于iconv进行编码),但我对此表示怀疑(您可能会遇到其他错误)

正如ti7所建议的,使用pyreadstat,您需要指定编码,在本例中,latin1将完成以下操作:

>>> import pyreadstat
# This raises an error
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
  File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
  File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
  File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)

# This is fine
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav", encoding="latin1")
>>> 


相关问题 更多 >

    热门问题