提取段落中与列表中的单词相似的单词

import difflib sent = "The boy went to twn and bought sausage and chicken. He then picked a tddy for his sister" list1 = ["town","teddy","chicken","boy went"] [difflib.get_close_matches(x.lower().strip(), sent.split()) for x in list1 ]

2条回答

网友

1楼 · 编辑于 2024-07-02 13:57:38

在^{}的文档中注意：

difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6)
Return a list of the best "good enough" matches. word is a sequence for which close matches are desired (typically a string), and possibilities is a list of sequences against which to match word (typically a list of strings).
Optional argument n (default 3) is the maximum number of close matches to return; n must be greater than 0.
Optional argument cutoff (default 0.6) is a float in the range [0, 1]. Possibilities that don’t score at least that similar to word are ignored.

目前，您正在使用默认的n和cutoff参数

您可以指定其中一个（或两个），以缩小返回的匹配项的范围

例如，您可以使用0.75的cutoff分数：

result = [difflib.get_close_matches(x.lower().strip(), sent.split(), cutoff=0.75) for x in list1]

或者，您可以指定最多只返回1个匹配项：

result = [difflib.get_close_matches(x.lower().strip(), sent.split(), n=1) for x in list1]

在任何一种情况下，您都可以使用列表理解来展平列表列表（因为difflib.get_close_matches()总是返回一个列表）：

matches = [r[0] for r in result]

由于您还希望检查bigram的紧密匹配，因此可以通过提取相邻“单词”的配对，并将它们作为possibilities参数的一部分传递给difflib.get_close_matches()

下面是一个完整的工作示例：

import difflib
import re

sent = "The boy went to twn and bought sausage and chicken. He then picked a tddy for his sister"

list1 = ["town", "teddy", "chicken", "boy went"]

# this extracts overlapping pairings of "words"
# i.e. ['The boy', 'boy went', 'went to', 'to twn', ...
pairs = re.findall(r'(?=(\b[^ ]+ [^ ]+\b))', sent)

# we pass the sent.split() list as before
# and concatenate the new pairs list to the end of it also
result = [difflib.get_close_matches(x.lower().strip(), sent.split() + pairs, n=1) for x in list1]

matches = [r[0] for r in result]

print(matches)
# ['twn', 'tddy', 'chicken.', 'boy went']

网友

2楼 · 编辑于 2024-07-02 13:57:38

如果您阅读了关于ifflib.get_close_matches（）的Python文档 https://docs.python.org/3/library/difflib.html 它返回所有可能的最佳匹配。方法签名： difflib.get_close_匹配（单词，可能性，n=3，截止值=0.6）

这里n是要返回的最大接近匹配数。所以我想你可以把这个作为1通过

>>> [difflib.get_close_matches(x.lower().strip(), sent.split(),1)[0] for x in list1]
['twn', 'tddy', 'chicken.', 'went']

相关问题更多 >

编程相关推荐

热门问题

热门文章