如何在一个文本块中找到所有已知的成分字符串?

2024-10-01 19:25:03 发布

您现在位置:Python中文网/ 问答频道 /正文

考虑到一系列成分:

text = """Ingredients: organic cane sugar, whole-wheat flour,
       mono & diglycerides. Manufactured in a facility that uses nuts."""

如何从postgres数据库中提取成分,或者在elasticsearch索引中找到它们,而不使用匹配的标记,比如Ingredients:或{}?在

预期产出为:

^{pr2}$

Tags: textinthatsugar成分facilityingredientsmono
1条回答
网友
1楼 · 发布于 2024-10-01 19:25:03

这个Python代码给出了以下输出:['organic cane sugar', 'whole-wheat flour', 'mono & diglycerides'] 它要求配料在“配料:”后面,所有配料都要列在“.”之前,就像你的情况一样。在

import re
text = """Ingredients: organic cane sugar, whole-wheat flour,
   mono & diglycerides. Manufactured in a facility that uses nuts."""

# Search everything that comes after 'Ingredients: ' and before '.'
m = re.search('(?<=Ingredients: ).+?(?=\.)', text, re.DOTALL) # DOTALL: make . match newlines too
items = m.group(0).replace('\n', ' ').split(',') # Turn newlines into   spaces, make a list of items separated by ','
items = [ i.strip() for i in items ] # Remove leading whitespace in each item
print items

相关问题 更多 >

    热门问题