Python在字符串中从列表中精确搜索单词?

2024-07-07 08:56:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要从字符串中的列表中找到确切的单词。在

我试过下面的代码。在这里,我得到了从列表中的单个单词的精确匹配,但是如何从列表中匹配两个单词。在

categories_to_retain = 
['SOLID',
 'GEOMETRIC',
 'FLORAL',
 'BOTANICAL',
 'STRIPES',
 'ABSTRACT',
 'ANIMAL',
 'GRAPHIC PRINT',
 'ORIENTAL',
 'DAMASK',
 'TEXT',
 'CHEVRON',
 'PLAID',
 'PAISLEY',
 'SPORTS']

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

x = x.upper()

print x

#x = "GRAPHIC"
#x = "GRAPHIC PRINTS"


matches = [cat for cat in categories_to_retain if cat in x.split()]

matches

Output:
['TEXT']

在这里你可以看到我的文字列表。我想从我的琴弦上找到这个词。在

我还需要找到这个词,即使它现在是复数形式还是过去式。例如,条纹、条纹、图案印花等

谢谢, 尼兰詹


Tags: andthetotext列表单词catfabric
3条回答

使用带边界的正则表达式来获得精确匹配,即使只有单个单词,如果试图忽略任何标点符号,则逻辑将不起作用:

import re

patts = re.compile("|".join(r"\b{}\b".format(s) for s in categories_to_retain), re.I)

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

print(patts.findall(x))

这会给你:

^{pr2}$

您可以使用正则表达式,这也有助于避免匹配字符的序列,并显示精确的输入字。在

import re
matches = []
categories_to_retain = ['SOLID',
     'GEOMETRIC',
     'FLORAL',
     'BOTANICAL',
     'STRIPES',
     'ABSTRACT',
     'ANIMAL',
     'GRAPHIC PRINT',
     'ORIENTAL',
     'DAMASK',
     'TEXT',
     'CHEVRON',
     'PLAID',
     'PAISLEY',
     'SPORTS']

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

x = x.upper()

print(x)

def searchWholeWord(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

for cat in categories_to_retain:
    return_value = searchWholeWord(cat)(x)
    if return_value:
        matches.append(cat)

print(matches)

输出:

^{pr2}$

这里使用默认split()拆分字符串,这意味着它将在每个空格处拆分:x.split()中有字符串“GRAPHIC”和“PRINT”,但没有“GRAPHIC PRINT”。您可能需要使用“if cat in x”,我相信在这种情况下它会返回您所需要的。在

这应该是有效的:

matches = [cat for cat in categories_to_retain if cat in x]

相关问题 更多 >