用字符串按任意顺序匹配数组元素

import re lookup_table = ['cats', 'cute kittens', 'dog litter park'] tweets = ['that is a cute cat', 'kittens are cute', 'that is a cute kitten', 'that is a dog litter park', 'no wonder that dog park is bad'] for tweet in tweets: lookup_found = None print re.findall(r"(?=(" + '|'.join(lookup_table) + r"))", tweet.lower())

3条回答

网友

1楼 · 编辑于 2024-06-26 14:40:34

问题1：

单数/复数： 为了让事情顺利进行，我会使用一个python包influct来消除单数和复数之类的。。。你知道吗

问题2：

拆分和合并：我写了一个小脚本来演示如何使用它，虽然没有经过严格的测试，但应该能让你动起来

import inflect 
p = inflect.engine()
lookup_table = ['cats', 'cute kittens', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']

for tweet in tweets:
    matched = []
    for lt in lookup_table:
            match_result = [lt for mt in lt.split() for word in tweet.split() if p.compare(word, mt)]
            if any(match_result):
                matched.append(" ".join(match_result))
    print tweet, '>>' , matched

网友

2楼 · 编辑于 2024-06-26 14:40:34

对于仅为一个单词文本的查找单词，可以使用

for word in tweet

对于像“可爱的小猫”这样的查找词，您可以在其中查看任何顺序。只需将单词拆分并在tweet字符串中查找即可。你知道吗

这是我尝试过的，它不是有效的，但它的工作。试着运行它。你知道吗

lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
          'kittens are cute',
          'that is a cute kitten',
          'that is a dog litter park',
          'no wonder that dog park is bad']

for word in lookup_table:
    for tweet in tweets:
        if " " in word:
            temp = word.split(sep=" ")
        else:
            temp = [word]
        for x in temp:
            if x in tweet:
                print(tweet)
                break

网友

3楼 · 编辑于 2024-06-26 14:40:34

我会这样做的。我认为查找表不必太严格，我们可以避免复数

import re
lookup_table = ['cat', 'cute kitten', 'dog litter park']
tweets = ['that is a cute cat',
      'kittens are cute',
      'that is a cute kitten',
      'that is a dog litter park',
      'no wonder that dog park is bad']
for data in lookup_table:
    words=data.split(" ")
    for word in words:
        result=re.findall(r'[\w\s]*' + word + '[\w\s]*',','.join(tweets))
        if len(result)>0:
            print(result)

相关问题更多 >

编程相关推荐

热门问题

热门文章