rsplit（）无法使用正则表达式拆分列

2024-09-24 02:18:49 发布

男 | 程序猿一只，喜欢编程写python代码。

原始df

import pandas as pd
df  = pd.DataFrame({
    'Ref':['CU12','SE00', 'RLA1234', 'RLA456', 'LU00', 'RLA1234MA12','RLA1234MA13', 'CU00','LU00']
} )

    Ref
0   CU12
1   SE00
2   RLA1234
3   12345
4   RLA456
5   LU00
6   RLA1234MA12
7   RLA1234MA13
8   CU00
9   LU00

要求：我需要使用regex和rsplit（）拆分字符串和数字。我这里有3种类型的值

字符串+数字
数字
字符串+数字+字符串+数字。我需要rsplit（）并只获取右边的数字，然后获取字符串的其余部分所以

CU12应该给出CU和12， RLA1234MA12应给出RLA1234MA和12， 12345应该是12345

split（）工作正常，可以正确拆分列，但在rsplit（）方面我的正则表达式无法生成所需的列。我确实阅读了split（）和rsplit（）的文档。这是我试过的。我的df看起来像这样

result = df['Ref'].str.split('([A-Za-z]*)(\d*)', expand=True)

这给了我

    0   1   2   3   4   5   6   7   8   9
0       CU  12                  None    None    None
1       SE  00                  None    None    None
2       RLA 1234                    None    None    None
3           12345                   None    None    None
4       RLA 456                 None    None    None
5       LU  00                  None    None    None
6       RLA 1234        MA  12              
7       RLA 1234        MA  13              
8       CU  00                  None    None    None
9       LU  00                  None    None    None

我只需要在我的结果中得到2列，这样我就可以这样做

result = result.loc[:,[1,2]]
result.rename(columns={1:'x', 2:'y'}, inplace=True)
print(result)


x   y
0   CU  12
1   SE  00
2   RLA 1234
3       12345
4   RLA 456
5   LU  00
6   RLA1234MA   12
7   RLA1234MA   13
8   CU  00
9   LU  00

但是当我使用rsplit（）时，我的列不会像在split（）中那样被拆分

我现在唯一的选择是在我的列上使用apply，并编写一个自定义函数，该函数将从字符串的末尾遍历该字符串，并在找到字符后立即对其进行切片。有没有办法使用rsplit（）。我哪里做错了

Tags：字符串 none ref df 数字 result split lu

1条回答

网友

1楼 · 发布于 2024-09-24 02:18:49

将^{}与具有命名捕获组的给定regex模式一起使用：

result = df['Ref'].str.extract(r'(?P<x>\w*?)(?P<y>\d*)$')

或者，也可以将^{}与expand=True一起使用：

result = df['Ref'].str.split(r'(?<!\d)(?=\d+$)', expand=True)

结果:

# print(result)

           x      y
0         CU     12
1         SE     00
2        RLA   1234
3             12345
4        RLA    456
5         LU     00
6  RLA1234MA     12
7  RLA1234MA     13
8         CU     00
9         LU     00

测试regex模式^{}

rsplit（）无法使用正则表达式拆分列

相关问题更多 >

编程相关推荐

热门问题

热门文章

rsplit（）无法使用正则表达式拆分列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >