我试图在数据集中选择字符串“POS PURCHASE”后面的前两个单词
这是我的数据集
df:
ID transaction_description
1 POS PURCHASE MR PRICE WHK FAC
2 WITHDRAWAL FEE
3 POS PURCHASE KFC WERNHIL STATE
4 REJECTED ATM TRANSACTION
5 ATM CASH WITHDRAWAL
6 POS PURCHASE EDGARS GROVE
我希望我的输出是这样的:
dfnew:
ID transaction_description TRANX
1 POS PURCHASE MR PRICE WHK FAC MR PRICE
2 WITHDRAWAL FEE WITHDRAWAL FEE
3 POS PURCHASE KFC WERNHIL STATE KFC WERNHIL
4 REJECTED ATM TRANSACTION REJECTED ATM TRANSACTION
5 ATM CASH WITHDRAWAL ATM CASH WITHDRAWAL
6 POS PURCHASE EDGARS GROVE MALL EDGARS GROVE
我尝试使用此代码,但无法创建包含所需输出的新列
code:
for value in df['transaction_description'].values:
non_data = re.split('POS PURCHASE |POS PURCHASE ',value)
terms_list = [term for term in non_data if len(term) > 0]
substrs = [term.split()[0:1] for term in terms_list]
result = [' '.join(term) for term in substrs]
print (result)
这是一种使用regex的方法
例如:
输出:
使用
str.extract
编辑如果POS购买总是在开始,就像示例数据中的情况一样,您可以将其删除
相关问题 更多 >
编程相关推荐