用字符串按任意顺序匹配数组元素

2024-06-26 14:40:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我对python还很陌生,正在尝试找出tweet是否有任何查找元素。你知道吗

如果我能找到“猫”这个词,它应该和“猫”匹配,可爱的小猫也可以任意搭配。但从我的理解来看,我找不到解决办法。感谢您的指导。你知道吗

import re
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']
for tweet in tweets:
    lookup_found = None
    print re.findall(r"(?=(" + '|'.join(lookup_table) + r"))", tweet.lower())

输出

['cat']
[]
[]
['dog litter park']
[]

预期产量:

that is a cute cat > cats
kittens are cute > cute kittens
this is a cute kitten > cute kittens
that is a dog litter park > dog litter park
no wonder that dog park is bad > dog litter park

Tags: reparkcutethatistablelookupare
3条回答

问题1:

单数/复数: 为了让事情顺利进行,我会使用一个python包influct来消除单数和复数之类的。。。你知道吗

问题2:

拆分和合并: 我写了一个小脚本来演示如何使用它,虽然没有经过严格的测试,但应该能让你动起来

import inflect 
p = inflect.engine()
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']

for tweet in tweets:
    matched = []
    for lt in lookup_table:
            match_result = [lt for mt in lt.split() for word in tweet.split() if p.compare(word, mt)]
            if any(match_result):
                matched.append(" ".join(match_result))
    print tweet, '>>' , matched

对于仅为一个单词文本的查找单词,可以使用

for word in tweet

对于像“可爱的小猫”这样的查找词,您可以在其中查看任何顺序。只需将单词拆分并在tweet字符串中查找即可。你知道吗

这是我尝试过的,它不是有效的,但它的工作。试着运行它。你知道吗

lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']

for word in lookup_table:
    for tweet in tweets:
        if " " in word:
            temp = word.split(sep=" ")
        else:
            temp = [word]
        for x in temp:
            if x in tweet:
                print(tweet)
                break

我会这样做的。我认为查找表不必太严格,我们可以避免复数

import re
lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
      'kittens are cute',
      'that is a cute kitten',
      'that is a dog litter park',
      'no wonder that dog park is bad']
for data in lookup_table:
    words=data.split(" ")
    for word in words:
        result=re.findall(r'[\w\s]*' + word + '[\w\s]*',','.join(tweets))
        if len(result)>0:
            print(result)

相关问题 更多 >