如何使用多个正则表达式来清除pandas中的列内容?

2024-09-27 00:17:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我想设置多个正则表达式,当匹配时,这些正则表达式必须替换为某个值。例如,我编写了一个正则表达式re.search('QuickPay with Zelle payment to *', re.IGNORECASE),如果在数据帧列中匹配,我想用“Payment to*”替换它。我想要有多个这样的正则表达式键值对

作为一个实际示例,如果列中有“QuickPay with Zelle payment to Zack”,则应将其替换为“payment to Zack”。如果列中有“QuickPay with Zelle payment from Zack”,则应替换为“payment from Zack”。如果有*DD BR*的匹配项,则应将其替换为“Dunkin Donuts”和多个此类情况。我希望这是在一个自动化的方式,我可以只附加到键值对,然后改进我的清洗功能

我试着使用df.apply()df.replace(),但不知道从那里可以走到哪里

以下是一些相关代码:

import pandas as pd
import re

filterMap = {
    re.search('QuickPay with Zelle payment to ', re.IGNORECASE): 'Payment to',
    re.search('QuickPay with Zelle payment from ', re.IGNORECASE): 'Payment from'
}

df = pd.read_csv('./data/data.csv', header=None, skiprows=[0], usecols=[1, 2, 3])

date = df[1]
amount = df[3]
title = df[2]

cleanTitle = title.replace(to_replace=filterMap, value=filterMap)

print(cleanTitle)

Tags: tofromredfsearchwithpaymentreplace
2条回答

只需使用replace

replace_map = {
    '[Q|q]uick[P|p]ay with [Z|z]elle payment to ': 'Payment to',
    '[Q|q]uick[P|p]ay with [Z|z]elle payment from ': 'Payment from'
}

代码

df.replace({'title': replace_map}, regex=True, inplace=True)

输出

>>> df
                               title
0    QuickPay with Zelle payment to 
1    quickPay with zelle payment to 
2    quickpay with zelle payment to 
3  QuickPay with Zelle payment from 
4  Quickpay with zelle payment from 

>>> replace_map = {
...     '[Q|q]uick[P|p]ay with [Z|z]elle payment to ': 'Payment to',
...     '[Q|q]uick[P|p]ay with [Z|z]elle payment from ': 'Payment from'
... }
>>> df.replace({'title': replace_map}, regex=True, inplace=True)
>>> df
          title
0    Payment to
1    Payment to
2    Payment to
3  Payment from
4  Payment from

创建了一个泛型函数,您可以在其中向re.sub()方法添加更多条件。 希望这有帮助

def replace_clean(text):
 text1 = re.sub('QuickPay with Zelle payment to','Payment to',text)
 text1 = re.sub('QuickPay with Zelle payment from','Payment from',text1)
 text1 = re.sub('DD BR','Dunkin Donuts',text1)
 return text1
df['cleanTitle'] = df['title'].map(lambda x: replace_clean(x))

相关问题 更多 >

    热门问题