提取具有特殊字符的正则表达式

2024-09-30 10:37:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我想基于另一列ID在数据帧df中创建一列。对于包含字符串SATID,我想提取由特殊字符“-”连接的浮点,并将提取的浮点放在名为new_col的新列中。如果ID不包含SAT字符串,则将其保留为NaN

df如下所示:

    Date        ID                   Time
0   2007-01-10  SAT 1 HHSP           900
1   2007-01-10  DOUBLE 7 HHSP        900
2   2007-01-10  SAT GF-06-5CSBG.431  1000
3   2007-01-10  MA HYDRO HHSP        900
4   2007-01-10  2.233 HHSP           900
5   2007-01-10  SAT L2-15-3CSB1.252  1000
6   2007-01-10  SECTION 6 HHSP       900

预期产出:

    Date        ID                   Time     new_col
0   2007-01-10  SAT 1 HHSP           900      NaN
1   2007-01-10  DOUBLE 7 HHSP        900      NaN
2   2007-01-10  SAT GF-06-5CSBG.431  1000     06-5
3   2007-01-10  MA HYDRO HHSP        900      NaN
4   2007-01-10  2.233 HHSP           900      NaN
5   2007-01-10  SAT L2-15-3 CSB1.252  1000    15-3  * In this case 15-3 instead of 2-15 is extracted because L2 is not completely floats.
6   2007-01-10  SECTION 6 HHSP       900      NaN

Tags: 字符串iddfnewdatetimecolnan
1条回答
网友
1楼 · 发布于 2024-09-30 10:37:04

^{}与前面有--连接的数字一起使用,并且仅用于由^{}过滤的SAT值:

m = df['ID'].str.contains('SAT')
df['new_col'] = df.loc[m, 'ID'].str.extract('[-\s+](\d+\-\d+)')
print (df)
         Date                   ID  Time new_col
0  2007-01-10           SAT 1 HHSP   900     NaN
1  2007-01-10        DOUBLE 7 HHSP   900     NaN
2  2007-01-10  SAT GF-06-5CSBG.431  1000    06-5
3  2007-01-10        MA HYDRO HHSP   900     NaN
4  2007-01-10           2.233 HHSP   900     NaN
5  2007-01-10  SAT L2-15-3CSB1.252  1000    15-3
6  2007-01-10       SECTION 6 HHSP   900     NaN

如果可以使用值SAT在列中启动:

df['new_col'] = df['ID'].str.extract('^SAT.*[-\s+](\d+\-\d+)', expand=False)

相关问题 更多 >

    热门问题