连接两个字符串并去掉list regex

2024-07-02 12:13:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据框,在B列中包含两部分字符串,用正则表达式从a列中提取:

df['B'] = df['A'].str.findall(r'([S][\d]|[V][\d]{3})')

                       A                                     B
1   R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593       ['S1', 'V087']
2   R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105       ['S1', 'V023']
3   R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033       ['S1', 'V155']

我想去掉B列中的列表,并用'_'连接两个字符串

结果如下所示:

                       A                            B
1   R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593   S1_V087
2   R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105   S1_V023
3   R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033   S1_V155

我想用正则表达式从列A中提取的另一个内容是字符串的这一部分,如下所示:

I have no idea how the regex would look!

                A                                      C
1   R13_IR_T20I1E7_PP3_S1_N002_V087_1785984_12593   S1_1785984
2   R13_IR_T20I1E7_PP3_S1_N003_V023_5896589_15105   S1_5896589
3   R13_IR_T20I1E7_PP3_S1_N004_V155_2541236_11033   S1_2541236 

对不起,我有两个问题,谢谢你的帮助


Tags: 数据字符串dfirstrs1r13pp3
2条回答

首先,你只需申请“u2;”。加入B:

df['B'] = df['B'].apply('_'.join)

其次,您不需要正则表达式,只需按“\ux”拆分,获得所需的值,然后再次加入:

df['C'] = df['A'].apply(lambda x: '_'.join([x.split('_')[4], x.split('_')[-2]]))

用法:str.join("_")

Ex:

df['B'] = df['B'].str.join("_") 
print(df['B'])

输出:

0    S1_V087
1    S1_V023
2    S1_V155
Name: B, dtype: object

MoreInfo


使用正则表达式提取内容

df['C'] = "S1_" + df['A'].str.extract("(\d+)_\d+$")  
print(df['C'])

输出:

0    S1_1785984
1    S1_5896589
2    S1_2541236
Name: C, dtype: object

相关问题 更多 >