我有一个列中的列表。例如:
powderA1992
powderA199
creamB2001
creamB200
假设正确的数据点是powederA1992
和creamB2000
如何在新列中匹配和替换powderA199
以匹配powderA1992
这个问题的小样本,但我有60k个条目
这是我的密码:
from fuzzywuzzy import fuzz, process
sellers= dataset['seller_name']
sellers = sellers.dropna()
print (sellers.head(10))
sellers2 = sellers.copy()
sellers2.drop_duplicates(keep='first', inplace=False)
print (sellers2.head(10))
print(len(sellers))
print (len(sellers2))
def match_term(term, list_names,min_score=0):
max_score = -1
max_name = ""
for term2 in list_names:
score = fuzz.ratio(term, term2)
if (score > min_score) & (score > max_score):
max_name = term2
max_score = score
return (max_name, max_score)
for i in sellers:
print (i, match_term(i, sellers2,20))
目前没有回答
相关问题 更多 >
编程相关推荐