
2024-10-04 11:31:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框df,我希望在我的列中用特定的值分隔来显示第一个单词和数字以及它的'T'值。我想要第一个用“-”分隔的单词及其T值。这是棘手的,因为有些T值用“-”分隔,其他值则用“#”分隔。 例如,其中一个值中的-12T,以及另一个值中的\u 14T


type                                                 free      use    total

Hello-HEL-HE-A6123-123A-12T_TYPE-v.A                 10        10       20
Hello-HEL-HE-A6123-123A-12T_TYPE-v.E                 5         1        6
Hello-HEL-HE-A6123-123A-50T_TYPE-v.C                 1         4        5
Hello-HEL-HE-A6123-123A-50T_TYPE-v.A                 2         1        1
Happy-HAP-HA-R650-570A-90T_version-v.A               10        0        10
Kind-KIN-KI-T490-NET_14T-A.0                         7         4        3
Kind-KIN-KI-T490-NET_14T-A.0                         6         3        2
AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A                  3         0        3
AY14.5-fyy-FY-R770-256G-6.4T-R1-v.A                  0         20       20


    type                free    use  total
    Hello   12T         10      10   20
    Hello   12T         5       1    6
    Hello   50T         1       4    5
    Hello   50T         2       1    1
    Happy   90T         10      0    10
    Kind    14T         7       4    3
    Kind    14T         6       3    2
    AY14.5  6.4T        3       0    3
    AY14.5  6.4T        0       20   20


df['type']=df['type'].str.extract('(\w+(?=[-AYY]))')+ " "+ df['type'].str.extract('(?<=0G-)(.*?)(?=\-|_)')


import pandas as pd

def extract_value(s):
    regex = re.search(r'(^.+?)-.+?(\d+(?:\.\d+)?T)', s)
    if regex:
        first_word = regex.group(1)
        code = regex.group(2)
        return f'{first_word} {code}'
    return s

df.columns = ['type']
df['type'] = df[type'].apply(lambda x: extract_value(x))



Tags: 数据freehellodfusetypeextract单词
df['type'].str.extract(r'(^\w+.\d|^\w+)')+' '+df['type'].str.extract(r'(\d.\d+T|\d+T)')

     type      free  use  total
0    Hello 12T    10   10     20
1    Hello 12T     5    1      6
2    Hello 50T     1    4      5
3    Hello 50T     2    1      1
4    Happy 90T    10    0     10
5     Kind 14T     7    4      3
6     Kind 14T     6    3      2
7  AY14.5 6.4T     3    0      3
8  AY14.5 6.4T     0   20     20


import re

pattern = """
^([^-]+)-    # From the beginning of the string, capture all non-hyphen characters and stop at the first actual hyphen.
.+?          # Consume all characters up to the next capture group in this pattern
([\d.]+T)    # Capture all digits (including a literal period) that end with a "T".

extracted_df = df["type"].str.extract(pattern, flags=re.X)

        0     1
0   Hello   12T
1   Hello   12T
2   Hello   50T
3   Hello   50T
4   Happy   90T
5    Kind   14T
6    Kind   14T
7  AY14.5  6.4T
8  AY14.5  6.4T


df["type"] = extracted_df[0] + " " + extracted_df[1]

          type  free  use  total
0    Hello 12T    10   10     20
1    Hello 12T     5    1      6
2    Hello 50T     1    4      5
3    Hello 50T     2    1      1
4    Happy 90T    10    0     10
5     Kind 14T     7    4      3
6     Kind 14T     6    3      2
7  AY14.5 6.4T     3    0      3
8  AY14.5 6.4T     0   20     20


相关问题 更多 >