在Python中使用Seaborn为bigrams和trigrams创建副本

2024-10-02 12:25:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用Spyder作为Anaconda的一部分,并尝试按事件类型对tweet(文本)进行分类。为此,我使用了包交叉值得分,已经使用TfidVectorizer对我的tweet进行了向量化,然后使用fit\u transform对unigrams、bigrams和trigrams进行了转换,如下所示:

# TF-IDF on unigrams, bigrams and trigrams
tfidf_words = TfidfVectorizer(sublinear_tf=True, min_df=0, norm='l2', encoding='latin-1',
                              ngram_range=(1,1), stop_words='english')

# vectorize for bigrams
tfidf_bigrams = TfidfVectorizer(sublinear_tf=True, min_df=0, norm='l2', encoding='latin-1',
                              ngram_range=(2,2), stop_words='english')

# vecorize for trigrams
tfidf_trigrams = TfidfVectorizer(sublinear_tf=True, min_df=0, norm='l2', encoding='latin-1',
                              ngram_range=(3,3), stop_words='english')

# Transform and fit each of the outputs from TF-IDF (unigrams, bigrams and trigrams)
x_train_words = tfidf_words.fit_transform(x_train_sm.preprocessed).toarray()

# bigrams
x_train_bigrams = tfidf_bigrams.fit_transform(x_train_sm.preprocessed).toarray()

#trigrams
x_train_trigrams = tfidf_trigrams.fit_transform(x_train_sm.preprocessed).toarray()

现在,我使用包cross_val_score执行交叉验证,以计算unigram、bigrams和trigrams的平均精度。一旦完成,我将尝试为达到的精度生成并保存一个箱线图。对于4种不同的型号,此功能已完成:

^{pr2}$

unigrams的输出正是我想要的:

Boxplot for unigrams

现在,当我运行bigrams和trigrams的代码时(突出显示所有代码并单击“play”),我得到以下结果:

大图:

[Boxplot for bigrams[2]

八卦图:

Boxplot for trigrams

每一个的代码都是相同的,只是它们使用“cv_bigrams”和“cv_trigrams”作为框式图的数据输入。每个代码如下。在

二元代码:

^{3}$

三元代码:

# create blank dataframe with an index equal to number of CV folds * number of models tested
cv_trigrams = pd.DataFrame(index=range(CV * len(models)))

# clear the previous list called 'entries' that was populated with values
entries = []

# calculate the accuracy at each fold and populate the results in the 'entries' list
# populate the dataframe 'cv_trigrams' with the fold and accuracy score at each fold
i = 0
for model in models:
    #model_name = #model.__class__.__name__
    model_name = names[i]
    # model => the model that will be used to fit the data
    # x_train_trigrams => data that is to be fitted by the selected model (trigrams)
    # y_train_sm => y training data after oversampling (event_id)
    # scoring => the type of score you want the function 'cross_val_score' to return
    # cv = number of folds you want to performed with cross-validation
    accuracies = cross_val_score(model, x_train_trigrams, y_train_sm, scoring ='accuracy', cv=CV)
    for fold_idx, accuracy in enumerate(accuracies):
        entries.append((model_name, fold_idx, accuracy))
        cv_trigrams = pd.DataFrame(entries, columns=['model_name_trigrams', 'fold_idx', 'accuracy'])
    i = i + 1

以下是如果我只选择以下代码并运行:

# plot the results of each model as a box plot    
box_bigrams = sns.boxplot(x='model_name_bigrams', y='accuracy', data=cv_bigrams)
box_bigrams = sns.boxplot(x='model_name_bigrams', y='accuracy', data=cv_bigrams)
fig_bigrams = box_bigrams.get_figure()
fig_bigrams.savefig('boxplot_bigrams.png')

Boxplot of bigrams when run on its own

同样适用于三角图:

# plot the results of each model as a box plot    
box_trigrams = sns.boxplot(x='model_name_trigrams', y='accuracy', data=cv_trigrams)
box_trigrams = sns.boxplot(x='model_name_trigrams', y='accuracy', data=cv_trigrams)
fig_trigrams = box_trigrams.get_figure()
fig_trigrams.savefig('boxplot_trigrams.png')

输出:

Output for trigrams when run on its own

你知道为什么当我一次运行所有代码时(当我把这段代码投入生产时,我需要这样做),而不是突出显示代码段并单独运行时,为什么我会得到相互重叠的重复boxplots?在


Tags: ofthe代码nameboxdatamodeltrain
1条回答
网友
1楼 · 发布于 2024-10-02 12:25:07

回应@ImportanceOfBeingErnest的评论,你的代码太复杂了,你的问题也不够清楚。你想创建3个不同的图形,每种情况一个(单数、二元和三元)?您是否尝试使用一个具有3个轴的图形(matplotlib中称为子图)?你想把三个箱子并排放在一张图上吗?在

对我来说,最简单的方法是创建一个包含3个子图的图形,如下所示:

fig, (ax1, ax2, ax3) = plt.subplots(3,1, figsize=(xx,yy))  # choose appropriate size to fit your needs
sns.boxplot(x='model_name_unigrams', y='accuracy', data=cv_unigrams, ax=ax1)
sns.boxplot(x='model_name_bigrams', y='accuracy', data=cv_bigrams, ax=ax2)
sns.boxplot(x='model_name_trigrams', y='accuracy', data=cv_trigrams, ax=ax3)
fig.savefig('your_figure_name_here.png')

请参阅subplots demo here和有关^{}^{}的文档。在the documentation for ^{}中,您将看到它是一个“轴级别”函数,这意味着您可以要求它在您选择的任何轴对象上绘图

相关问题 更多 >

    热门问题