Pandas:读取多字符分隔符csv文件?

2024-09-29 02:19:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用pandas.read_csv读取以下csv文件,但无法正常工作

                                                                Mat  Pur Mat    Mat  Proc ABC   TimePrice            Crncy Supplier      
Plant Material Number   Material Description                    Grp  Grp Status Type Type Class daysper each         Key   Consignment   
-----------------------------------------------------------------------------------------------------------------------------------------
0009  076/JJJJJJJ331    DUMMY UNIT/Dummy Unit 265x225x15        ZEEJJMA9   P5   JERI   F         99          99.9900 SEK               0
0009  1/JJJJJJJJJ/1R3   EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P8   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJJJJ/4     EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P5   JERI   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ/1     BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305  MA9   P5   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJ04        EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9   P5   JERI   F         99      99,999.9900 SEK               0
0009  1/JJJJJJJJ/6      CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9   P5   JCOM   F         99           9.9900 SEK               0
0009  1/JJJJJJJJJ       PACKAGE/Pallet 800*114*600              ZEEJJMA9   P5   JVER   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ       PACKING MATERIAL/Pallet 1200*800*160    ZEEJJMA9   P5   JCOM   F        999         999.9900 SEK               0
0009  1/JJJJJJJJ/06     BAG/PåSE/MINIGRIP/300*250 MM            ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
0009  1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100      ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0

我尝试了以下代码,但问题是

  • 在材质描述中显示的空白
  • 发现很难阅读标题
  • 第2行、第3行等的Material DescriptionMat Grp之间没有空格
import pandas as pd

df = pd.read_csv(file_path, delim_whitespace=True, skiprows=4, header=None, error_bad_lines=False, engine="python")

Tags: csvpandasreadmaterialp5matgrpsek
1条回答
网友
1楼 · 发布于 2024-09-29 02:19:59

我相信您正在寻找pandasread_fwf函数。不幸的是,您必须手动指定列的宽度。以下是前几列的示例:

s = '''
0009  076/JJJJJJJ331    DUMMY UNIT/Dummy Unit 265x225x15        ZEEJJMA9   P5   JERI   F         99          99.9900 SEK               0
0009  1/JJJJJJJJJ/1R3   EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P8   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJJJJ/4     EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P5   JERI   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ/1     BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305  MA9   P5   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJ04        EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9   P5   JERI   F         99      99,999.9900 SEK               0
0009  1/JJJJJJJJ/6      CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9   P5   JCOM   F         99           9.9900 SEK               0
0009  1/JJJJJJJJJ       PACKAGE/Pallet 800*114*600              ZEEJJMA9   P5   JVER   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ       PACKING MATERIAL/Pallet 1200*800*160    ZEEJJMA9   P5   JCOM   F        999         999.9900 SEK               0
0009  1/JJJJJJJJ/06     BAG/PåSE/MINIGRIP/300*250 MM            ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
0009  1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100      ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
'''

from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(s), colspecs=[(0,5), (6,20), (24,64), (64,72)])

以下是输出数据帧:

   Unnamed: 0      Unnamed: 1                                Unnamed: 2  \
0           9  076/JJJJJJJ331          DUMMY UNIT/Dummy Unit 265x225x15   
1           9  1/JJJJJJJJJ/1R  EQUIPPED MAGAZINE/SUP 6601; Equipped mag   
2           9   1/JJJJJJJJJ/4  EQUIPPED MAGAZINE/SUP 6601; Equipped mag   
3           9   1/JJJJJJJJJ/1  BASIC EQUIP.MAGAZINE/Remote IRU Enclosur   
4           9      1/JJJJJJ04   EQUIPPED CABINET/BYB 504 Multi-Pack Kit   
5           9    1/JJJJJJJJ/6  CABLE BUSHING/O-Ring id 21, th 2 for M25   
6           9     1/JJJJJJJJJ                PACKAGE/Pallet 800*114*600   
7           9     1/JJJJJJJJJ      PACKING MATERIAL/Pallet 1200*800*160   
8           9   1/JJJJJJJJ/06              BAG/PåSE/MINIGRIP/300*250 MM   
9           9      1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100   

  Unnamed: 3  
0   ZEEJJMA9  
1   ZEEJJMA9  
2   ZEEJJMA9  
3   305  MA9  
4   ZEEJJMA9  
5   ZEEJJMA9  
6   ZEEJJMA9  
7   ZEEJJMA9  
8   ZEEJJMA9  
9   ZEEJJMA9  

相关问题 更多 >