擅长:python、mysql、java
<p>尝试:</p>
<pre><code>result_cleared = [x for x in b.ngram_fd.keys() if x[0] != x[1]]
</code></pre>
<hr/>
<p><strong>编辑</strong>:如果文本存储在数据框中,则可以执行以下操作:</p>
<pre><code># the dummy data from your comment
df=pd.DataFrame({'Text': ['this is a stupid text with no no no sense','this song says na na na','this is very very very very annoying']})
def create_bigrams(text):
b = nltk.collocations.BigramCollocationFinder.from_words(text.split())
return [x for x in b.ngram_fd.keys() if x[0] != x[1]]
df["bigrams"] = df["Text"].apply(create_bigrams)
df["bigrams"].apply(print)
</code></pre>
<p>这首先将包含bigram的列添加到数据帧,然后打印列值。如果只希望输出而不操纵<code>df</code>,请将最后两行替换为:</p>
<pre><code>df["Text"].apply(create_bigrams).apply(print)
</code></pre>