利用Pandas优化查找和替换数据帧

import pandas as pd df1 = pd.DataFrame({'Data' : ["Hull Damage happened and its insured by maritime hull insurence company","Non Cash Entry and claims are blocked"]}) df2 = pd.DataFrame({ 'Find' : ["Insurence","Non cash entry"], 'Replace' : ["Insurance","Blocked"], })

backup = str(backup) TrainingClaimNotes_KwdSyn = [] for index,row in KeywordSynonym.iterrows(): word = KeywordSynonym.Synonym[index].lower() value = KeywordSynonym.Keyword[index].lower() my_regex = r"\b(?=\w)" + re.escape(word) + r"\b(?!\w)" if re.search(my_regex,backup): backup = re.sub(my_regex, value, backup) TrainingClaimNotes_KwdSyn.append(backup) TrainingClaimNotes_KwdSyn_Cmp = backup.split('\'", "\'')

1条回答

网友

1楼 · 发布于 2024-09-30 08:17:18

使用：

import pandas as pd

df1 = pd.DataFrame({'Data' : ["Hull Damage happened and its insured by maritime hull insurence company","Non Cash Entry and claims are blocked"]})

df2 = pd.DataFrame({ 'Find' : ["Insurence","Non cash entry"],
                    'Replace' : ["Insurance","Blocked"],
                       }) 

find_repl = dict(zip(df2['Find'].str.lower(), df2['Replace'].str.lower()))
d2 = {r'(\b){}(\b)'.format(k):r'\1{}\2'.format(v) for k,v in find_repl.items()}

df1['Data_1'] = df1['Data'].str.lower().replace(d2, regex=True)

输出

^{pr2}$

说明

dict(zip(df2['Find'].str.lower(), df2['Replace'].str.lower()))在要替换的内容和要替换的字符串之间创建一个映射-

{'insurence': 'insurance', 'non cash entry': 'blocked'}

将查找转换为regex，使其可以进行查找-

d2 = {r'(\b){}(\b)'.format(k):r'\1{}\2'.format(v) for k,v in find_repl.items()}

{'(\\b)insurence(\\b)': '\\1insurance\\2', '(\\b)non cash entry(\\b)': '\\1blocked\\2'}

最后一件事就是做真正的替代品-

df1['Data_1'] = df1['Data'].str.lower().replace(d2, regex=True)

注意：为了找到合适的匹配项，我到处做了.lower()。很明显你可以把它重塑成你想要的样子。在

相关问题更多 >

编程相关推荐

热门问题

热门文章