假设我有来自示例here的数据集:
import pandas as pd
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'name': ['Miller', 'Jacobson', 'Ali', 'Milner', 'Cooze', 'Jacon', 'Ryaner', 'Sone', 'Sloan', 'Piger', 'Riani', 'Ali'],
'preTestScore': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'postTestScore': [25, 94, 57, 62, 70, 25, 94, 57, 62, 70, 62, 70]}
df = pd.DataFrame(raw_data, columns = ['regiment', 'company', 'name', 'preTestScore', 'postTestScore'])
df
我想做一个regiment
vspreTestScore
的盒线图。为此,我需要找出这两个变量的相对分布。因此,我将regiment
按preTestScore
分组:
如果我现在尝试绘制盒形图,它会给出一个错误:
import seaborn as sns
sns.boxplot(data=df1)
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-131-8296ca940a25> in <module>()
1 df1 = df['regiment'].groupby(df['preTestScore']).count()
2 df1
----> 3 sns.boxplot(data=df1)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, notch, ax, **kwargs)
2209 plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
2210 orient, color, palette, saturation,
-> 2211 width, dodge, fliersize, linewidth)
2212
2213 if ax is None:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
439 width, dodge, fliersize, linewidth):
440
--> 441 self.establish_variables(x, y, hue, data, orient, order, hue_order)
442 self.establish_colors(color, palette, saturation)
443
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
94 if hasattr(data, "shape"):
95 if len(data.shape) == 1:
---> 96 if np.isscalar(data[0]):
97 plot_data = [data]
98 else:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
765 key = com._apply_if_callable(key, self)
766 try:
--> 767 result = self.index.get_value(self, key)
768
769 if not is_scalar(result):
~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, series, key)
3116 try:
3117 return self._engine.get_value(s, k,
-> 3118 tz=getattr(series.dtype, 'tz', None))
3119 except KeyError as e1:
3120 if len(self) > 0 and self.inferred_type in ['integer', 'boolean']:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 0
因此,我将groupby对象转换为dataframe并再次尝试boxplot:
df1 = pd.DataFrame(df1)
df1
sns.boxplot(data=df1)
这会生成一个boxplot,但其分布不是regiment
vspreTestScore
(事实上,这个boxplot对我没有意义;我不知道它的y
轴值代表什么)。为此,我们需要在boxplot中指定x
和{
sns.boxplot(x='regiment', y='preTestScore', data=df1)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-132-fc8036eb7d0b> in <module>()
----> 1 sns.boxplot(x='regiment', y='preTestScore', data=df1)
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in boxplot(x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth, whis, notch, ax, **kwargs)
2209 plotter = _BoxPlotter(x, y, hue, data, order, hue_order,
2210 orient, color, palette, saturation,
-> 2211 width, dodge, fliersize, linewidth)
2212
2213 if ax is None:
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in __init__(self, x, y, hue, data, order, hue_order, orient, color, palette, saturation, width, dodge, fliersize, linewidth)
439 width, dodge, fliersize, linewidth):
440
--> 441 self.establish_variables(x, y, hue, data, orient, order, hue_order)
442 self.establish_colors(color, palette, saturation)
443
~\AppData\Local\Continuum\anaconda3\lib\site-packages\seaborn\categorical.py in establish_variables(self, x, y, hue, data, orient, order, hue_order, units)
149 if isinstance(input, string_types):
150 err = "Could not interpret input '{}'".format(input)
--> 151 raise ValueError(err)
152
153 # Figure out the plotting orientation
ValueError: Could not interpret input 'regiment'
我们可以通过执行以下操作来检查df1
的数据类型:
df1.dtype
>>> dtype('int64')
当我把df1
中的值放入一个新的数据帧df2
中,并再次尝试boxplot时,它起作用了:
df2 = pd.DataFrame({'preTestScore': [2,3,4,24,31], 'regiment': [3,3,2,2,2]})
df2
sns.boxplot(x='regiment', y='preTestScore', data=df2)
所以,与其复制groupby对象的内容并将其粘贴到一个新的dataframe中,我如何直接获得一个dataframe来存储一个dataframe中两个变量的相对分布呢?在
使用
^{1}$to_frame
将序列转换为数据帧,然后在绘图之前重置索引:相关问题 更多 >
编程相关推荐