我有一个info.txt文件,看起来像这样:
B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
B 19980331 00107241 00107241000000 00107241000000 00107241000000 00100000 00100000000000 00100000000000 00100000000000 00000100
您可以告诉tat前3行有10列,但第4行有11列,因此当我读取thsi文件时:
import pandas as pd
import numpy as np
df =pd.read_csv('C:\Users\Petter\Desktop\info.txt',sep=r"\s+", header=None, dtype=str, engine="python")
df
我得到了这个和一个错误:
0 1 2 3 4 5 6 7 8 9
0 B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
1 B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
2 B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
Skipping line 4: Expected 10 fields in line 4, saw 11. Error could possibly be due to quotes being ignored when a multi-char delimiter is used.
理想情况下,它应该自动向df添加一个以上的列。输出应如下所示:
0 1 2 3 4 5 6 7 8 9 10
0 B 19960331 00100000 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
1 B 19960430 00099100 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
2 B 19960531 00098500 00000000000000 00000000000000 00000000000000 00000000 00000000000000 00000000000000 00000000000000
我试过:
df = pd.DataFrame(pd.np.empty((0, 11)))
但它不起作用
您可以使用error\u bad\u line参数来避免此错误
这很有效,可能适合您的需要:
相关问题 更多 >
编程相关推荐