Python/Pandas：如何处理满足特定条件的数据列

2条回答

网友

1楼 · 编辑于 2024-10-04 09:22:54

你可以用

import pandas as pd
df = pd.DataFrame({'userlabel':['SZ5GZTD_[56][13631808]','YZ5GZTC-3_[51][13680735]','XZ5GZTA_12-[51][13574893]','testYZ5GZWC_11-[51][13632101]'], 'country':['russia','uk','usa','cuba']})
df['ci'] = df['userlabel'].str.extract(r"(?i)^(?:yz|testyz)[^_-]*[_-](\d+)[-_]", expand=True)
>>> df['ci']
0    NaN
1      3
2    NaN
3     11
Name: ci, dtype: object
# To rearrange columns, add the following line:
df = df[['userlabel', 'ci', 'country']]
>>> df
                       userlabel   ci country
0         SZ5GZTD_[56][13631808]  NaN  russia
1       YZ5GZTC-3_[51][13680735]    3      uk
2      XZ5GZTA_12-[51][13574893]  NaN     usa
3  testYZ5GZWC_11-[51][13632101]   11    cuba

请参阅regex demo。

正则表达式详细信息：

(?i)-使模式不区分大小写（无需使用str.lower()）
^-字符串的开头
(?:yz|testyz)-与yz或testyz匹配的非捕获组
[^_-]*-除_和-之外的零个或多个字符
[_-]-第一个{}或{}
(\d+)-group1（由于Series.str.extract只返回这个捕获的子字符串，所以它需要一个捕获组）：一个或多个数字
[-_]-a-或_

网友

2楼 · 编辑于 2024-10-04 09:22:54

import re

def get_val(s):
    l = re.findall(r'^(YZ|testYZ).*[_-](\d+)[_-].*', s)
    return  None if(len(l) == 0) else l[0][1]

df['ci'] = df['userlabel'].apply(lambda x: get_val(x))
df = df[['userlabel', 'ci', 'country']]

userlabel                         ci    country
0   SZ5GZTD_[56][13631808]        None  russia
1   YZ5GZTC-3_[51][13680735]      3     uk
2   XZ5GZTA_12-[51][13574893]     None  usa
3   testYZ5GZWC_11-[51][13632101] 11    cuba

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python/Pandas：如何处理满足特定条件的数据列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >