如何在使用seaborn绘图时处理缺失值？问题的回答

如何在使用seaborn绘图时处理缺失值？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我肯定会在绘制数据之前处理缺失的值。是否不使用<code>dropna()</code>完全取决于数据集的性质。<code>alcconsumption</code>是单个系列还是数据帧的一部分？在后一种情况下，使用<code>dropna()</code>也会删除其他列中的相应行。丢失的值是少还是多？它们是在你的系列中传播，还是倾向于在群体中出现？或许有理由相信你的数据集中有一个趋势？ 如果缺少的值很少且分散，则可以很容易地使用dropna（）。在其他情况下，我会选择用以前观察到的值（1）填充缺少的值。甚至用插值（2）填充缺失的值。但要小心！用填充或插值的观测值替换大量数据可能会严重中断数据集并导致非常错误的结论。 下面是一些使用你的代码片段的例子。。。 <pre><code>seaborn.distplot(data['alcconsumption'],hist=True,bins=100) plt.xlabel('AlcoholConsumption') plt.ylabel('Frequency(normalized 0->1)') </code></pre> 。。。在合成数据集上： <pre><code>import pandas as pd import numpy as np import seaborn as sns import matplotlib.pyplot as plt def sample(rows, names): ''' Function to create data sample with random returns Parameters ========== rows : number of rows in the dataframe names: list of names to represent assets Example ======= >>> sample(rows = 2, names = ['A', 'B']) A B 2017-01-01 0.0027 0.0075 2017-01-02 -0.0050 -0.0024 ''' listVars= names rng = pd.date_range('1/1/2017', periods=rows, freq='D') df_temp = pd.DataFrame(np.random.randint(-100,100,size=(rows, len(listVars))), columns=listVars) df_temp = df_temp.set_index(rng) return df_temp df = sample(rows = 15, names = ['A', 'B']) df['A'][8:12] = np.nan df </code></pre> 输出： <pre><code> A B 2017-01-01 -63.0 10 2017-01-02 49.0 79 2017-01-03 -55.0 59 2017-01-04 89.0 34 2017-01-05 -13.0 -80 2017-01-06 36.0 90 2017-01-07 -41.0 86 2017-01-08 10.0 -81 2017-01-09 NaN -61 2017-01-10 NaN -80 2017-01-11 NaN -39 2017-01-12 NaN 24 2017-01-13 -73.0 -25 2017-01-14 -40.0 86 2017-01-15 97.0 60 </code></pre> （1）使用向前填充<a href="https://pandas.pydata.org/pandas-docs/stable/missing_data.html" rel="nofollow noreferrer">pandas.DataFrame.fillna(method = ffill)</a> <code>ffill</code>将“向前填充值”，这意味着它将用上面行的值替换<code>nan</code>。 <pre><code>df = df['A'].fillna(axis=0, method='ffill') sns.distplot(df, hist=True,bins=5) plt.xlabel('AlcoholConsumption') plt.ylabel('Frequency(normalized 0->1)') </code></pre> <a href="https://i.stack.imgur.com/3YPgD.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/3YPgD.png" alt="enter image description here"/></a> （2）使用带<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.interpolate.html#pandas.DataFrame.interpolate" rel="nofollow noreferrer">pandas.DataFrame.interpolate()</a>的插值 根据不同的方法插值。时间插值是对日数据和高分辨率数据进行插值，以插值给定的区间长度。 <pre><code>df['A'] = df['A'].interpolate(method = 'time') sns.distplot(df['A'], hist=True,bins=5) plt.xlabel('AlcoholConsumption') plt.ylabel('Frequency(normalized 0->1)') </code></pre> <a href="https://i.stack.imgur.com/3Wl3T.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/3Wl3T.png" alt="enter image description here"/></a> 如您所见，不同的方法呈现两个截然不同的结果。我希望这对你有用。如果没有，请告诉我，我会再看一次。

如何在使用seaborn绘图时处理缺失值？

1 个回答

相关Python问题