我有一个数据帧(original_df)和列description
,我想通过使用正则表达式搜索描述中的关键字来创建另一列Label
description Label
fund trf 0049614823 transfers
alat transfer transfers
data purchase via airtime
alat pos buy pos
alat web buy others
atm wd rch debit money withdrawals
alat pos buy pos
salar alert charges salary
mtn purchase via airtime
top- up purchase via airtime
我想出的密码是
description
列中搜索模式description
并基于
描述关键字我尝试在这里实现,但我没有得到正确的逻辑,我得到一个关键字错误 我也尝试了我目前可能做的一切,但仍然不能想出正确的逻辑
df = original_df['description'].sample(100)
position = 0
while position < len(df):
if any(re.search(r"(tnf|trsf|trtr|trf|transfer)",df[position])):
original_df['Label'] == 'transfers'
elif any(re.search(r'(airtime|data|vtu|recharge|mtn|glo|top-up)',df[position])):
original_df['Label'] == 'aitime
elif any(re.search(r'(pos|web pos|)',df[position])):
original_df['Label'] == 'pos
elif any(re.search(r'(salary|sal|salar|allow|allowance)',df[position])):
original_df['Label'] == 'salary'
elif any(re.search(r'(loan|repayment|lend|borrow)',df[position])):
original_df['Label'] == 'loan'
elif any(re.search(r'(withdrawal|cshw|wdr|wd|wdl|withdraw|cwdr|cwd|cdwl|csw)',df[position])):
return 'withdrawals'
position += 1
return others
print(df_sample)
您可以将正则表达式逻辑放入函数中,然后
apply
将其放入数据帧。这样可以避免手动循环伪代码代码:
根据正则表达式代码创建
label()
函数:然后
apply
将label()
函数应用于df
的行:相关问题 更多 >
编程相关推荐