从句子中获取所有的右分岔词对

2024-09-27 23:28:08 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个字符串,比如:

 'velvet evening purse bags'

我怎样才能得到这个词的所有单词对?换句话说,这两个词的所有组合:

'velvet evening'
'velvet purse'
'velvet bags'
'evening purse'
'evening bags'
'purse bags'

我知道python的nltk包可以提供bigram,但我正在寻找超出该功能的东西。或者我必须用Python编写自己的自定义函数吗?你知道吗


Tags: 函数字符串功能单词bagsnltkbigramvelvet
3条回答

这应该很有趣()

如果输入是velvet evening purse bags,而所需的输出是@MrGeek使用itertools.combinations生成的,那实际上就是skipgrams来自https://tedboy.github.io/nlps/generated/generated/nltk.skipgrams.html的定义

因此,您可以通过以下方法实现相同的效果:

from nltk import skipgrams

s = 'velvet evening purse bags'
tokens = word_tokenize(s)
list(skipgrams(tokens, n=2, k=len(tokens)-1))

[输出]:

[('velvet', 'evening'),
 ('velvet', 'purse'),
 ('velvet', 'bags'),
 ('evening', 'purse'),
 ('evening', 'bags'),
 ('purse', 'bags')]

在这种情况下,每个单词只能与它右边的另一个单词组合,这在某种程度上符合人类英语。你知道吗

在这种情况下,所有单词的“排列”都成对出现,甚至连单词本身也成对出现:

from itertools import product
s = 'velvet evening purse bags'
tokens = set(word_tokenize(s))
list(product(tokens, tokens))

[输出]:

[('velvet', 'velvet'),
 ('velvet', 'evening'),
 ('velvet', 'purse'),
 ('velvet', 'bags'),
 ('evening', 'velvet'),
 ('evening', 'evening'),
 ('evening', 'purse'),
 ('evening', 'bags'),
 ('purse', 'velvet'),
 ('purse', 'evening'),
 ('purse', 'purse'),
 ('purse', 'bags'),
 ('bags', 'velvet'),
 ('bags', 'evening'),
 ('bags', 'purse'),
 ('bags', 'bags')]

您可以使用^{}来实现:

s = 'velvet evening purse bags'

from nltk import word_tokenize

words = word_tokenize(s)

from itertools import combinations

pairs = [' '.join(comb) for comb in combinations(words, 2)]

print(pairs)

输出:

['velvet evening', 'velvet purse', 'velvet bags', 'evening purse', 'evening bags', 'purse bags']

你也可以去老派。。。你知道吗

text =  'velvet evening purse bags'

n = []
ans = []
for i in text.split():
    for j in text.split():
        if j != i:
             if (i, j) not in n:
                ans.append((i, j))
                n.append((i, j))
                n.append((j, i))

输出

[('velvet', 'evening'),
 ('velvet', 'purse'),
 ('velvet', 'bags'),
 ('evening', 'purse'),
 ('evening', 'bags'),
 ('purse', 'bags')]

相关问题 更多 >

    热门问题