pandas在一个*.dat文件上读取_csv，该文件用cedilla分隔，而不是在datafram中拆分成列

1Ç70Ç23929Ç4341Ç1111Ç0Ç0Ç1ÇAAÇ012ÇFillerÇ 1Ç75Ç45555Ç4324Ç2222Ç0Ç0Ç1ÇAAÇ011ÇFillerÇ 1Ç76Ç23957Ç4334Ç3333Ç0Ç0Ç1ÇAAÇ011ÇFillerÇ 1Ç72Ç47776Ç4344Ç4444Ç0Ç0Ç1ÇABÇ014ÇFillerÇ 1Ç73Ç88880Ç4354Ç4444Ç0Ç0Ç1ÇCDÇ011ÇFillerÇ 1Ç74Ç99991Ç4364Ç5555Ç0Ç0Ç1ÇEEÇ014ÇFillerÇ

0 0 1Ç70Ç23929Ç4341Ç1111Ç0Ç0Ç1ÇAAÇ012ÇFi... 1 1Ç75Ç45555Ç4324Ç2222Ç0Ç0Ç1ÇAAÇ011ÇFi... 2 1Ç76Ç23957Ç4334Ç3333Ç0Ç0Ç1ÇAAÇ011ÇFi... 3 1Ç72Ç47776Ç4344Ç4444Ç0Ç0Ç1ÇABÇ014ÇFi... 4 1Ç73Ç88880Ç4354Ç4444Ç0Ç0Ç1ÇCDÇ011ÇFi...

0 1 0 1Ç70Ç23929Ç4341Ç1111Ç0Ç0Ç1ÇAAÇ012Ç illerÇ 1 1Ç75Ç45555Ç4324Ç2222Ç0Ç0Ç1ÇAAÇ011Ç illerÇ 2 1Ç76Ç23957Ç4334Ç3333Ç0Ç0Ç1ÇAAÇ011Ç illerÇ 3 1Ç72Ç47776Ç4344Ç4444Ç0Ç0Ç1ÇABÇ014Ç illerÇ 4 1Ç73Ç88880Ç4354Ç4444Ç0Ç0Ç1ÇCDÇ011Ç illerÇ

#file location dataPath = "C:/Users/Documents/Pytest/" itextfile = join(dataPath,'sample.dat') fb = open(itextfile, 'r') data = fb.read() print(data) tf=pandas.read_csv(StringIO(data), sep='Ã‡', header=None) #tf=pandas.read_csv(StringIO(data), sep='\Ç', header=None) print(tf)

1Ã‡71Ã‡23929Ã‡44Ã‡5685Ã‡0Ã‡0Ã‡1Ã‡aaÃ‡012Ã‡FillerÃ‡ 1Ã‡72Ã‡23953Ã‡40Ã‡3319Ã‡0Ã‡0Ã‡1Ã‡bbÃ‡011Ã‡FillerÃ‡ 1Ã‡73Ã‡23957Ã‡43Ã‡7323Ã‡0Ã‡0Ã‡1Ã‡ccÃ‡011Ã‡FillerÃ‡ 1Ã‡74Ã‡24006Ã‡41Ã‡6938Ã‡0Ã‡0Ã‡1Ã‡bbÃ‡014Ã‡FillerÃ‡ 1Ã‡75Ã‡24140Ã‡45Ã‡0518Ã‡0Ã‡0Ã‡1Ã‡ddÃ‡011Ã‡FillerÃ‡ Output 0 1 2 3 4 5 6 7 8 9 10 11 0 1 71 23929 44 5685 0 0 1 aa 12 Filler NaN 1 1 72 23953 40 3319 0 0 1 bb 11 Filler NaN 2 1 73 23957 43 7323 0 0 1 cc 11 Filler NaN

2条回答

网友

1楼 · 编辑于 2024-05-02 17:11:20

尝试传递sep='\Ç'，因为这对我有效：

In [35]:
import pandas as pd
import io
t="""1Ç70Ç23929Ç4341Ç1111Ç0Ç0Ç1ÇAAÇ012ÇFillerÇ
1Ç75Ç45555Ç4324Ç2222Ç0Ç0Ç1ÇAAÇ011ÇFillerÇ
1Ç76Ç23957Ç4334Ç3333Ç0Ç0Ç1ÇAAÇ011ÇFillerÇ
1Ç72Ç47776Ç4344Ç4444Ç0Ç0Ç1ÇABÇ014ÇFillerÇ
1Ç73Ç88880Ç4354Ç4444Ç0Ç0Ç1ÇCDÇ011ÇFillerÇ
1Ç74Ç99991Ç4364Ç5555Ç0Ç0Ç1ÇEEÇ014ÇFillerÇ"""
pd.read_csv(io.StringIO(t), sep='\Ç', header=None)

Out[35]:

   0   1      2     3     4   5   6   7   8   9       10  11
0   1  70  23929  4341  1111   0   0   1  AA  12  Filler NaN
1   1  75  45555  4324  2222   0   0   1  AA  11  Filler NaN
2   1  76  23957  4334  3333   0   0   1  AA  11  Filler NaN
3   1  72  47776  4344  4444   0   0   1  AB  14  Filler NaN
4   1  73  88880  4354  4444   0   0   1  CD  11  Filler NaN
5   1  74  99991  4364  5555   0   0   1  EE  14  Filler NaN

网友

2楼 · 编辑于 2024-05-02 17:11:20

作为标准做法，您可能希望使用codecs包打开您的文档。这将允许您指定编码（在大多数情况下是UTF-16），而codecs包似乎非常擅长对诸如行结束符和编码之类的内容进行解码。在

Reading tab-delimited file with Pandas - works on Windows, but not on Mac

import codecs

doc = codecs.open('document','rU','UTF-16') (open for reading with "universal" type set)

df = pandas.csv_read(doc, sep='Ç', nrows=Totrows, header=Skiprows)

相关问题更多 >

编程相关推荐

热门问题

热门文章