Pandas预期在第153行有10个字段,见第11行,如何再添加一列

2024-09-29 23:32:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个info.txt文件,看起来像这样:

B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000    00000100

您可以告诉tat前3行有10列,但第4行有11列,因此当我读取thsi文件时:

import pandas as pd
    import numpy as np
    df =pd.read_csv('C:\Users\Petter\Desktop\info.txt',sep=r"\s+", header=None, dtype=str, engine="python")
    df

我得到了这个和一个错误:

    0   1   2   3   4   5   6   7   8   9
0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000

Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.

理想情况下,它应该自动向df添加一个以上的列。输出应如下所示:

    0   1   2   3   4   5   6   7   8   9  10
0   B   19960331    00100000    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
1   B   19960430    00099100    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000
2   B   19960531    00098500    00000000000000  00000000000000  00000000000000  00000000    00000000000000  00000000000000  00000000000000

我试过:

df = pd.DataFrame(pd.np.empty((0, 11))) 

但它不起作用


Tags: 文件csvimportinfonumpytxtpandasdf
2条回答

您可以使用error\u bad\u line参数来避免此错误

import pandas as pd
import numpy as np
df = pd.read_csv("C:\Users\Petter\Desktop\info.txt", header=None, delimiter=r"\s+", error_bad_lines=False)
df

这很有效,可能适合您的需要:

df = pd.read_csv(... names=range(11))

enter image description here

相关问题 更多 >

    热门问题