我有一个包含3列的熊猫数据帧
数据框=
| id | product_details | taxo |
100 [Sales Package=6 Pair slipper,
Strap Material=Rubber, qty=1,
categoryPath=Footwear>Men>Slippers & Flip Flops, 1
codAvailable=true, detailedSpecs=Multicolor Color,
None Character Type Slippers For Men
Sole Material Rubber]
200 [Brand Fit=Regular, Fabric=Cotton Polyester Blend,
Fabric Care=Hand wash, Fit=Regular, Ideal For=Mens,
Neck Type=Round Neck, Pack of=1, Pattern=Graphic Print, 2
Reversible=No, Sales Package=1 T-Shirt, Size=M,
Sleeve=Half Sleeve, Sleeve Type=Wide,
Suitable For=Western Wear,
categoryPath=Apparels>Men>Polos & T-Shirts,
codAvailable=true, detailedSpecs=Fabric Cotton
Polyester Blen Regular Fit Round Neck T-shirt
Pattern Graphic Print Sleeve Type Wide Half Sleeve,
discountPercentage=0]
我想在dataframe的product\u details列中找到关键字categoryPath,并在它前面返回一些文本,直到第一次出现逗号并将其写入新的dataframe(df\u new)
我的数据帧中有800多万行。你知道吗
预期输出:dfïu new
| id | category_path |
100 Footwear>Men>Slippers & Flip Flops
200 Apparels>Men>Polos & T-Shirts
使用这个正则表达式:
categoryPath=[\w>\s&]+
你得到
Footwear>Men>Slippers & Flip Flops
超出
Sales Package=6 Pair slipper, Strap Material=Rubber, qty=1, categoryPath=Footwear>Men>Slippers & Flip Flops, codAvailable=true, detailedSpecs=Multicolor Color; None Character; Type: Slippers; For Men; Sole Material: Rubber
我想这就是你想要的。你知道吗
相关问题 更多 >
编程相关推荐