在dataframe的一列上进行复杂的模式分离，同时在Python中保留原始列

type free use total Hello-HEL-HE-A6123-123A-12T_TYPE-v.A 10 10 20 Hello-HEL-HE-A6123-123A-12T_TYPE-v.E 5 1 6 Hello-HEL-HE-A6123-123A-50T_TYPE-v.C 1 4 5 Hello-HEL-HE-A6123-123A-50T_TYPE-v.A 2 1 1 Happy-HAP-HA-R650-570A-90T_version-v.A 10 0 10 Kind-KIN-KI-T490-NET_14T-A.0 7 4 3 Kind-KIN-KI-T490-NET_14T-A.0 6 3 2 AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A 3 0 3 AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A 0 20 20

type free use total Hello 12T 10 10 20 Hello 12T 5 1 6 Hello 50T 1 4 5 Hello 50T 2 1 1 Happy 90T 10 0 10 Kind 14T 7 4 3 Kind 14T 6 3 2 AY14.5 6.4T 3 0 3 AY14.5 6.4T 0 20 20

import pandas as pd def extract_value(s): regex = re.search(r'(^.+?)-.+?(\d+(?:\.\d+)?T)', s) if regex: first_word = regex.group(1) code = regex.group(2) return f'{first_word} {code}' return s df.columns = ['type'] df['type'] = df[type'].apply(lambda x: extract_value(x))

2条回答

网友
1楼 · 编辑于 2024-10-04 11:31:37

df['type'].str.extract(r'(^\w+.\d|^\w+)')+' '+df['type'].str.extract(r'(\d.\d+T|\d+T)') type free use total 0 Hello 12T 10 10 20 1 Hello 12T 5 1 6 2 Hello 50T 1 4 5 3 Hello 50T 2 1 1 4 Happy 90T 10 0 10 5 Kind 14T 7 4 3 6 Kind 14T 6 3 2 7 AY14.5 6.4T 3 0 3 8 AY14.5 6.4T 0 20 20

网友
2楼 · 编辑于 2024-10-04 11:31:37

当然！可以使用此正则表达式一次性捕获所需的所有内容。我在正则表达式中包含了注释。为了通知re，我传递了标志re.X，这意味着此模式是一个“详细”模式，其中包含在执行实际匹配时应忽略的注释
import re pattern = """ ^([^-]+)- # From the beginning of the string, capture all non-hyphen characters and stop at the first actual hyphen. .+? # Consume all characters up to the next capture group in this pattern ([\d.]+T) # Capture all digits (including a literal period) that end with a "T". """.strip() extracted_df = df["type"].str.extract(pattern, flags=re.X) print(extracted_df) 0 1 0 Hello 12T 1 Hello 12T 2 Hello 50T 3 Hello 50T 4 Happy 90T 5 Kind 14T 6 Kind 14T 7 AY14.5 6.4T 8 AY14.5 6.4T
现在我们已经提取了相关的信息位，我们可以继续将它们粘在一起以覆盖旧的"type"列：
df["type"] = extracted_df[0] + " " + extracted_df[1] print(df) type free use total 0 Hello 12T 10 10 20 1 Hello 12T 5 1 6 2 Hello 50T 1 4 5 3 Hello 50T 2 1 1 4 Happy 90T 10 0 10 5 Kind 14T 7 4 3 6 Kind 14T 6 3 2 7 AY14.5 6.4T 3 0 3 8 AY14.5 6.4T 0 20 20
与常规正则表达式一样，这可能无法捕获所有角落的情况，但我希望它阐明了如何使用正则表达式和捕获组从列中收集相关信息的方法

相关问题更多 >

编程相关推荐

热门问题

热门文章