<p>通过对产品列表进行反向排序并从段落中删除第一个匹配的产品实例,解决了我的用例。下面是我如何做的代码。这可能是正确的方法,也可能不是正确的方法,但解决了我的问题。即使产品列表中有n个产品,并且段落中有许多来自产品列表的匹配字符串,它也在工作。感谢您的研究和帮助</p>
<pre><code>products = ["productA v4.1", "productA v4.1.5", "productA v4.1.5 ver"]
#applying the reverse sorting so that large strings comes first
products = sorted(products, key=len, reverse=True)
paragraph = "Troubleshooting steps for productA v4.1.5 ver documents also has steps for productA v4.1 document "
def checkIfProdExist(x):
if paragraph.find(x) != -1:
return True
else:
return False
#filter all matched strings
prodResults = list(filter(checkIfProdExist, products))
print(prodResults)
# At this state Result is = ['productA v4.1.5 ver', 'productA v4.1.5', 'productA v4.1']
finalResult = []
# Loop through the matched the strings
for prd in prodResults:
if paragraph.find(prd) != -1:
# Loop through the each matched string and copy the first index
finalResult.append({"index":str(paragraph.find(prd)),"value":prd})
#Once Index copied replace all occurrences of matched string with empty so that next short string will not find it. i.e. removing productA v4.1.5 ver occurrences in paragraph will not provide chance to match productA v4.1.5 and productA v4.1
paragraph = paragraph.replace(prd,"")
print(finalResult)
# Final Result is [{'index': '26', 'value': 'productA v4.1.5 ver'}, {'index': '56', 'value': 'productA v4.1'}]
# If Paragraph is "Troubleshooting steps for productA v4.1.5 documents" then the result is [{'index': '26', 'value': 'productA v4.1.5'}]
</code></pre>