Python:在句子列表中查找常见的句子片段

2024-09-30 20:18:57 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个句子列表,比如:

quick brown blah red work word
quick brown too red blah someone
quick gray one two three
quick gray two three four
quick gray johnson week summer
quick gray johnson day week water fall
quick gray wicked stopper fall
quick gray hotel flamer walk
doggie bone
doggie python
doggie python tree flower python
doggie python flower whatever
tree bone stick

我正在寻找返回常见“parent”语句列表的代码:

quick brown
quick gray
quick gray johnson
doggie bone
doggie python
tree bone stick

泰铢


Tags: tree列表redquickthreeblahweekflower
2条回答

您可以使用regex轻松完成:

>>> result=[]
>>> for i in data:
>>>     r = re.search(r'([a-z]+\s*)+', i)
>>>     if r:
>>>         res = r.group(0).strip()
>>>         if res not in result:
>>>             result.append(res.strip())
>>> print(result)
['quick brown', 'quick gray', 'quick gray johnson', 'doggie', 'doggie python', 'tree']

给你:

def removeNumbers(data):
    result = []
    for sent in data:
        temp = []
        words = sent.split()
        for word in words:
            try:
                number = int(word)
                break
            except:
                temp.append(word)
        result.append(" ".join(temp))
    return result
data = [
    'quick brown 580 650 040 050',
    'quick brown 650 160 150 500',
    'quick gray 075 060 400',
    'quick gray 087 565 600',
    'quick gray johnson 149 135',
    'quick gray johnson 600 650 070 600',
    'quick gray 565 070 250',
    'quick gray 630 550 400',
    'doggie 256',
    'doggie python',
    'doggie python 350 675 106',
    'doggie python 417 560',
    'tree 196 106'
]
data = removeNumbers(data)
print(list(set(data)))

相关问题 更多 >