Gensim:如何将LDA模型生成的主题保存为可读取格式(csv,txt等)?

INFO : adding document #0 to Dictionary(0 unique tokens) INFO : built Dictionary(18 unique tokens) from 5 documents (total 20 corpus positions) INFO : using serial LDA version on this node INFO : running online LDA training, 2 topics, 1 passes over the supplied corpus of 5 documents, updating model once every 5 documents WARNING : too few updates, training might not converge; consider increasing the number of passes to improve accuracy INFO : PROGRESS: iteration 0, at document #5/5 INFO : 2/5 documents converged within 50 iterations INFO : topic #0: 0.079*cute + 0.076*broccoli + 0.070*adopted + 0.069*yesterday + 0.069*eat + 0.069*sister + 0.068*kitten + 0.068*kittens + 0.067*bananas + 0.067*chinchillas INFO : topic #1: 0.082*broccoli + 0.079*cute + 0.071*piece + 0.070*munching + 0.069*spinach + 0.068*hamster + 0.068*ate + 0.067*banana + 0.066*breakfast + 0.066*smoothie INFO : topic diff=0.470477, rho=1.000000 <gensim.models.ldamodel.LdaModel object at 0x10f1f4050>

3条回答

网友

1楼 · 编辑于 2024-06-26 00:18:05

以下是如何保存gensim LDA的模型：

from gensim import corpora, models, similarities

# create corpus and dictionary
corpus = ...
dictionary = ...

# train model, this might takes time
model = models.LdaModel.LdaModel(corpus=corpus,id2word=dictionary, num_topics=200,passes=5, alpha='auto')
# save model to disk (no need to use pickle module)
model.save('lda.model')

要打印主题，有以下几种方法：

# later on, load trained model from file
model =  models.LdaModel.load('lda.model')

# print all topics
model.show_topics(topics=200, topn=20)

# print topic 28
model.print_topic(109, topn=20)

# another way
for i in range(0, model.num_topics-1):
    print model.print_topic(i)

# and another way, only prints top words
for t in range(0, model.num_topics-1):
    print 'topic {}: '.format(t) + ', '.join([v[1] for v in model.show_topic(t, 20)])

网友

2楼 · 编辑于 2024-06-26 00:18:05

您只需要使用lda.show_topics(topics=-1)或任何数量的主题（topics=10，topics=15，topics=1000….）。我通常只是：

logfile = open('.../yourfile.txt', 'a')
print>>logfile, lda.show_topics(topics=-1, topn=10)

所有这些参数和其他参数都可以在gensimdocumentation中找到。

网友

3楼 · 编辑于 2024-06-26 00:18:05

您可以使用pickle模块。

import pickle
# your code
pickle.dump(lda,open(filename,'w'))
# you may load it back again
lda_copy = pickle.load(file(filename))

相关问题更多 >

编程相关推荐

热门问题

热门文章