找到匹配关键字并在匹配关键字前面返回一些文本,直到第一次出现comm

2024-10-01 07:41:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个包含3列的熊猫数据帧

数据框=

|  id    |            product_details                               | taxo |
   100        [Sales Package=6 Pair slipper,
               Strap Material=Rubber, qty=1,
               categoryPath=Footwear>Men>Slippers & Flip Flops,          1
               codAvailable=true, detailedSpecs=Multicolor Color, 
               None Character Type Slippers For Men
               Sole Material Rubber]

    200        [Brand Fit=Regular, Fabric=Cotton Polyester Blend,
               Fabric Care=Hand wash, Fit=Regular, Ideal For=Mens, 
               Neck Type=Round Neck, Pack of=1, Pattern=Graphic Print,   2 
               Reversible=No, Sales Package=1 T-Shirt, Size=M, 
               Sleeve=Half Sleeve, Sleeve Type=Wide, 
               Suitable For=Western Wear, 
               categoryPath=Apparels>Men>Polos & T-Shirts, 
               codAvailable=true, detailedSpecs=Fabric Cotton 
               Polyester Blen  Regular Fit Round Neck T-shirt  
               Pattern Graphic Print  Sleeve Type Wide Half Sleeve, 
               discountPercentage=0]

我想在dataframe的product\u details列中找到关键字categoryPath,并在它前面返回一些文本,直到第一次出现逗号并将其写入新的dataframe(df\u new)

我的数据帧中有800多万行。你知道吗

预期输出:dfïu new

| id  |         category_path                 |
  100    Footwear>Men>Slippers & Flip Flops

  200    Apparels>Men>Polos & T-Shirts

Tags: 数据idfortypeproductdetailsfitfabric
1条回答
网友
1楼 · 发布于 2024-10-01 07:41:22

使用这个正则表达式:categoryPath=[\w>\s&]+

你得到Footwear>Men>Slippers & Flip Flops

超出Sales Package=6 Pair slipper, Strap Material=Rubber, qty=1, categoryPath=Footwear>Men>Slippers & Flip Flops, codAvailable=true, detailedSpecs=Multicolor Color; None Character; Type: Slippers; For Men; Sole Material: Rubber

我想这就是你想要的。你知道吗

相关问题 更多 >