Pandas read_fwf似乎不尊重编码参数

2024-10-01 13:34:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我想尝试使用memory_map参数来查看它是否改善了文件的加载时间。(我真的不知道该参数的作用,但我想我可以试一试。)

当我尝试加载文件时,我得到错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 295: invalid start byte。我尝试设置编码参数(见下文),但似乎不起作用

代码如下:

import pandas as pd
fwf_widths  = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
               1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
               5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
               1,1,1,1,2,1,1,1,1,1,1,1,]
pd.read_fwf("MOVEOUTA.ALL.OUT1.txt",
            usecols=range(0,80, 2), 
            widths=fwf_widths,
            encoding='windows-1252',
            memory_map=True)

我是做错了什么,还是应该向熊猫提出问题(我有1.01版)

编辑:

我也尝试过,但仍然收到相同的错误:

with open("MOVEOUTA.ALL.OUT1.txt", mode='r',encoding='windows-1252', ) as f:
    df = pd.read_fwf(f,
                     usecols=range(0,80, 2), 
                     widths=fwf_widths,
                     memory_map=True)

Tags: 文件txtmapread参数as错误byte
1条回答
网友
1楼 · 发布于 2024-10-01 13:34:39

我不知道pandas.read_fwf是否接受参数encoding

pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, infer_nrows=100, **kwds)

Read a table of fixed-width formatted lines into DataFrame.

Also supports optionally iterating or breaking of the file into chunks.

Additional help can be found in the online docs for IO Tools.

以下代码段应该执行此任务(将StringIO的实例传递给filepath_or_buffer参数):

import pandas as pd
from io import StringIO

with open("MOVEOUTA.ALL.OUT1.txt", mode='r', encoding='windows-1252') as f:
    content = f.read()
 
fwf_widths  = [6,2,6,2,14,1,40,1,10,1,10,1,1,3,3,1,1,1,2,1,5,1,10,
               1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,3,1,1,1,2,1,
               5,1,10,1,10,1,30,2,30,1,18,1,2,1,5,1,2,1,2,3,1,1,1,
               1,1,1,1,2,1,1,1,1,1,1,1,]
df = pd.read_fwf( StringIO( content),
            usecols=range(0,80, 2),       # ??? this param not tested
            widths=fwf_widths,
            memory_map=True)

相关问题 更多 >