在Python中使用列表自动更正列值

2024-09-27 00:18:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我让eben尝试将state_name列与列表值匹配,它工作得很好,但是当输入数据变得区分大小写时

输入数据:

    state_name
0   Assan
1   Andhra Prade5h
2   M1zoram
3   Uttar Pr8desh
4   MIZORAM

我一直在使用的脚本:

from difflib import SequenceMatcher

lst = ['Assam','Andhra Pradesh', 'Mizoram', 'Uttar Pradesh'] #Correct Name List

def closest(s):
    highest = 0
    result = ''
    for i in lst:
        temp = SequenceMatcher(None, s, i).ratio() #Similarity Ratio
        if temp > highest:
            highest = temp
            result = i
    return result


df = pd.DataFrame(['Assan','Andhra Prade5h','M1zoram','Uttar Pr8desh'], columns = ["state_name"])

df['state_name'] = df['state_name'].apply(lambda x: closest(x))

#Output After
   

     state_name
    0   Assam
    1   Andhra Pradesh
    2   Mizoram
    3   Uttar Pradesh
    4   Assam

当输入值对例如MIZORAM区分大小写时,我得到了错误的值


Tags: 数据namedfresulttemp区分statehighest
2条回答

问题在于如何使用SequenceMatcher

当SequenceMatcher比较两个字符串时,它会考虑字符串中字符的大小写(-小写或大写)

例如:

from difflib import SequenceMatcher
temp1 = SequenceMatcher(None, "Hello", "hello").ratio()
temp2 = SequenceMatcher(None, "Hello", "HELLO").ratio()
temp3 = SequenceMatcher(None, "Hello", "Jelme").ratio()

print(temp1)  # 0.8
print(temp2)  # 0.2
print(temp3)  # 0.4

上面的代码显示,“Jelme”比“Hello”更接近“Hello”。 处理这个问题的一种方法是使用str.lower()将字符串更改为小写

您可以将这两个值转换为小写:

SequenceMatcher(None, s.lower(), i.lower()).ratio()

类似的解决方案:

from difflib import SequenceMatcher

lst = ['Assam','Andhra Pradesh', 'Mizoram', 'Uttar Pradesh'] #Correct Name List
arr = np.array(lst)

#convert list to lowercase 
lower = [x.lower() for x in lst]
def closest(s):
    #get index of maximal ratio
    idx = np.argmax([SequenceMatcher(None, s.lower(), i).ratio() for i in lower])
    #return value from list lst
    return arr[idx]


df = pd.DataFrame(['Assan','Andhra Prade5h','MIZORAM','Uttar Pr8desh'], 
                  columns = ["state_name"])

df['state_name'] = df['state_name'].apply(closest)

print (df)
       state_name
0           Assam
1  Andhra Pradesh
2         Mizoram
3   Uttar Pradesh

相关问题 更多 >

    热门问题