检查当前单词是否接近字符串中的单词的有效方法是什么?

2024-10-03 06:19:22 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑下面的例子:

  1. 例1:

    str1 = "wow...it  looks amazing"
    str2 = "looks amazi"
    

    你看amazi接近amazingstr2输入错误,我想写一个程序,告诉我amazi接近amazing,然后在str2中,我将amazi替换为amazing

  2. 例2:

    str1 = "is looking good"
    str2 = "looks goo"
    

    在这种情况下,更新的str2将是"looking good"

  3. 例3:

    str1 = "you are really looking good"
    str2 = "lok goo"
    

    在这种情况下str2将是"good",因为lok不接近looking(或者即使程序在这种情况下可以将lok转换为looking,那么我的问题的解决方案也没有问题)

  4. 例4:

    str1 = "Stu is actually SEVERLY sunburnt....it hurts!!!"
    str2 = "hurts!!"
    

    更新的str2"hurts!!!"

  5. 例5:

    str1 = "you guys were absolutely amazing tonight, a..."
    str2 = "ly amazin"
    

    更新的str2将被"amazing""ly"将被删除或替换为绝对

这个的算法和代码是什么

也许我们可以通过按字典顺序查看字符并设置 阈值类似于0.8或80%,所以如果word2str1中获得80%的word1序列字符,那么我们将str2中的word2替换为str1的单词? 有没有其他有效的python代码解决方案


Tags: 程序youis情况itgoodgoolooking
3条回答

有很多方法可以做到这一点。这个解决了你所有的例子。我添加了一个最小相似性过滤器,只返回高质量的匹配。这就是允许在最后一个示例中删除“ly”的原因,因为它并不完全关闭任何单词

Documentation

您可以使用pip install python-Levenshtein安装levenshtein

import Levenshtein

def find_match(str1,str2):
    min_similarity = .75
    output = []
    results = [[Levenshtein.jaro_winkler(x,y) for x in str1.split()] for y in str2.split()]
    for x in results:
        if max(x) >= min_similarity:
            output.append(str1.split()[x.index(max(x))])
    return output

你提议的每个样品

find_match("is looking good", "looks goo")

['looking','good']

find_match("you are really looking good", "lok goo")

['looking','good']

find_match("Stu is actually SEVERLY sunburnt....it hurts!!!", "hurts!!")

['hurts!!!']

find_match("you guys were absolutely amazing tonight, a...", "ly amazin")

['amazing']

我用正则表达式完成了它

def check_regex(str1,str2):
    #New list to store the updated value
    str_new = []
    for i in str2:
        # regular expression for comparing the strings
        x = ['['+i+']','^'+i,i+'$','('+i+')']
        for k in x:
            h=0
            for j in str1:
                #Conditions to make sure the word is close enough to the particular word
                if "".join(re.findall(k,j)) == i or ("".join(re.findall(k,j)) in i and abs(len("".join(re.findall(k,j)))-len(i)) == 1 and len(i)!=2):
                    str_new.append(j)
                    h=1
                    break
            if h==1:
                break
    return str_new
import re
str1 = input().split()
str2 = input().split()
print(" ".join(check_regex(str1,str2)))

see for me it's running

像这样:

str1 = "wow...it looks amazing"
str2 =  "looks amazi"
str3 = []

# Checking for similar strings in both strings:
for n in str1.split():
    for m in str2.split():
        if m in n:
            str3.append(n)

# If found 2 similar strings:
if len(str3) == 2:
    # If their indexes align:
    if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
        print(' '.join(str3))

elif len(str3) == 1:
    print(str3[0])

输出:

looks amazing

使用OP给出的条件进行更新:

str1 = "good..."
str2 =  "god.."
str3 = []

# Checking for similar strings in both strings:
for n in str1.split():
    for m in str2.split():

        # Calculating matching character in the 2 words:
        c = ''
        for i in m:
            if i in n:
                c+=i
        # If the amount of matching characters is greater or equal to 50% the length of the larger word
        # or the smaller word is in the larger word:
        if len(list(c)) >= len(n)*0.50 or m in n:
            str3.append(n)


# If found 2 similar strings:
if len(str3) == 2:
    # If their indexes align:
    if str1.split().index(str3[1]) - str1.split().index(str3[0]) == 1:
        print(' '.join(str3))

elif len(str3) == 1:
    print(str3[0])

相关问题 更多 >