怎样才能用Pandas画出两种方块图?

2024-10-01 00:18:41 发布

您现在位置:Python中文网/ 问答频道 /正文

您好,我正在用熊猫绘制两个不同的方框图:

plt.figure()
df['mean_train_score_error'] = [1] - df['mean_train_score']
df.boxplot(column=['mean_train_score_error'], by='modelo',
                                        medianprops = medianprops,
                                         autorange=True,showfliers=False, patch_artist=True, 
                                         vert=True, showmeans=True,meanline=True)
plt.ylabel('Error: 1-F1 Score')
plt.title('Error de entrenamiento')
plt.suptitle('')



df['mean_test_score_error'] = [1] - df['mean_test_score']
df.boxplot(column=['mean_test_score_error'], by='modelo',
                                        medianprops = medianprops,
                                         autorange=True,showfliers=False, patch_artist=True, 
                                         vert=True, showmeans=True,meanline=True)

plt.ylabel('Error: 1-F1 Score')
plt.title('Error de validación')
plt.suptitle('')

我得到了以下两个图:

enter image description here

enter image description here

问题是,是否可以在同一个图上绘制6个箱线图,并为每个图的每个三个箱线图使用不同的颜色

谢谢


Tags: testtruedfby绘制columntrainplt
1条回答
网友
1楼 · 发布于 2024-10-01 00:18:41
  • 最简单的方法是将数据从宽格式转换为长格式,然后使用hue参数使用seaborn绘图
  • pandas.wide_to_long
    • 必须有唯一的id,因此添加id
    • 正在转换的列必须具有类似的stubnames,这就是我将error移到列名前面的原因。
      • 错误列名将在一列中,值将在另一列中

导入和测试数据

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# setup data and dataframe
np.random.seed(365)
data = {'mod_lg': np.random.normal(0.3, .1, size=(30,)),
        'mod_rf': np.random.normal(0.05, .01, size=(30,)),
        'mod_bg': np.random.normal(0.02, 0.002, size=(30,)),
        'mean_train_score': np.random.normal(0.95, 0.3, size=(30,)),
        'mean_test_score': np.random.normal(0.86, 0.5, size=(30,))}

df = pd.DataFrame(data)
df['error_mean_test_score'] = [1] - df['mean_test_score']
df['error_mean_train_score'] = [1] - df['mean_train_score']
df["id"] = df.index

df = pd.wide_to_long(df, stubnames='mod', i='id', j='mode', sep='_', suffix='\D+').reset_index()
df["id"] = df.index

# display dataframe: this is probably what your dataframe looks like to generate your current plots
   id mode  mean_train_score  error_mean_test_score  mean_test_score  error_mean_train_score       mod
0   0   lg          0.663855              -0.343961         1.343961                0.336145  0.316792
1   1   lg          0.990114               0.472847         0.527153                0.009886  0.352351
2   2   lg          1.179775               0.324748         0.675252               -0.179775  0.381738
3   3   lg          0.693155               0.519526         0.480474                0.306845  0.470385
4   4   lg          1.191048              -0.128033         1.128033               -0.191048  0.085305

变换与绘图

  • error_score_name列包含来自error_mean_test_score&error_mean_train_score
  • error_score_value列包含这些值
# convert df error columns to long format
dfl = pd.wide_to_long(df, stubnames='error', i='id', j='score', sep='_', suffix='\D+').reset_index(level=1)
dfl.rename(columns={'score': 'error_score_name', 'error': 'error_score_value'}, inplace=True)

# display dfl

   error_score_name  mean_train_score       mod  mean_test_score mode  error_score_value
id                                                                                      
0   mean_test_score          0.663855  0.316792         1.343961   lg          -0.343961
1   mean_test_score          0.990114  0.352351         0.527153   lg           0.472847
2   mean_test_score          1.179775  0.381738         0.675252   lg           0.324748
3   mean_test_score          0.693155  0.470385         0.480474   lg           0.519526
4   mean_test_score          1.191048  0.085305         1.128033   lg          -0.128033

# plot dfl
sns.boxplot(x='mode', y='error_score_value', data=dfl, hue='error_score_name')

enter image description here

相关问题 更多 >