ValueError:数据重叠。在python中

2024-10-03 17:19:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我有数据帧df3,看起来像这样

未知列长度为AAA???可以是数据集中的任何内容

           Date    ID  Calendar_Year Month   DayName...  AAA_1E AAA_BMITH  AAA_4.1  AAA_CH
0    2019-09-17  8661           2019   Sep       Sun...     NaN       NaN      NaN     NaN
1    2019-09-18  8662           2019   Sep       Sun...     1.0       3.0     34.0     1.0
2    2019-09-19  8663           2019   Sep       Sun...     NaN       NaN      NaN     NaN
3    2019-09-20  8664           2019   Sep       Mon...     NaN       NaN      NaN     NaN
4    2019-09-20  8664           2019   Sep       Mon...     2.0       4.0     32.0     3.0
5    2019-09-20  8664           2019   Sep       Sat...     NaN       NaN      NaN     NaN
6    2019-09-20  8664           2019   Sep       Sat...     NaN       NaN      NaN     NaN
7    2019-09-20  8664           2019   Sep       Sat...     0.0       4.0     30.0     0.0

另一个数据帧dfMeans具有第三个数据帧的平均值

     Month Dayname           ID  ...  AAA_BMITH    AAA_4.1  AAA_CH
0      Jan     Thu  7686.500000  ...   0.000000  28.045455     0.0
1      Jan     Fri  7636.272727  ...   0.000000  28.136364     0.0
2      Jan     Sat  7637.272727  ...   0.000000  27.045455     0.0
3      Jan     Sun  7670.090909  ...   0.000000  27.090909     0.0
4      Jan     Mon  7702.909091  ...   0.000000  27.727273     0.0
5      Jan     Tue  7734.260870  ...   0.000000  27.956522     0.0

数据帧将由MonthDayname连接

我想用dfMean中的值替换df3中的nan

使用这条线

df3.update(dfMeans, overwrite=False, errors="raise")

但我有个错误

raise ValueError("Data overlaps.")

ValueError: Data overlaps.

如何用dfMean的值更新nan并避免这个错误?你知道吗

编辑:

我已将所有数据帧放入一个数据帧df

     Month Dayname           ID  ...  AAA_BMITH    AAA_4.1  AAA_CH
0      Jan     Thu  7686.500000  ...   0.000000  28.045455     0.0
1      Jan     Fri  7636.272727  ...   0.000000  28.136364     0.0
2      Jan     Sat  7637.272727  ...   0.000000  27.045455     0.0
3      Jan     Sun  7670.090909  ...   0.000000  27.090909     0.0
4      Jan     Mon  7702.909091  ...   0.000000  27.727273     0.0
5      Jan     Tue  7734.260870  ...   0.000000  27.956522     0.0

我如何用基于日名的平均值填写NAN?你知道吗


Tags: 数据idnanchsatsepjansun
2条回答

您可以在'Month'DayName'groupby并使用apply编辑数据帧。
使用fillna填充Nan值。fillna接受字典作为value参数:字典的键是列名,值是标量:标量用于替换每列中的Nan。使用loc可以从dMeans中选择适当的值。 您可以使用df3dfMeans列之间的交集来创建具有dict理解的词典。你知道吗

所有这些都对应于以下陈述:

df3filled = df3.groupby(['Month', 'DayName']).apply(lambda x : x.fillna(
    {col : dfMeans.loc[(dfMeans['Month'] == x.name[0]) & (dfMeans['Dayname'] == x.name[1]), col].iloc[0]
    for col in x.columns.intersection(dfMeans.columns)})).reset_index(drop=True)

使用^{}

数据:

       Date    ID  Calendar_Year Month Dayname  AAA_1E  AAA_BMITH  AAA_4.1  AAA_CH
 2019-09-17  8661           2019   Jan     Sun     NaN        NaN      NaN     NaN
 2019-09-18  8662           2019   Jan     Sun     1.0        3.0     34.0     1.0
 2019-09-19  8663           2019   Jan     Sun     NaN        NaN      NaN     NaN
 2019-09-20  8664           2019   Jan     Mon     NaN        NaN      NaN     NaN
 2019-09-20  8664           2019   Jan     Mon     2.0        4.0     32.0     3.0
 2019-09-20  8664           2019   Jan     Sat     NaN        NaN      NaN     NaN
 2019-09-20  8664           2019   Jan     Sat     NaN        NaN      NaN     NaN
 2019-09-20  8664           2019   Jan     Sat     0.0        4.0     30.0     0.0

df.set_index(['Month', 'Dayname'], inplace=True)

enter image description here

数据框平均值:

Month Dayname           ID  AAA_BMITH    AAA_4.1  AAA_CH
  Jan     Thu  7686.500000        0.0  28.045455     0.0
  Jan     Fri  7636.272727        0.0  28.136364     0.0
  Jan     Sat  7637.272727        0.0  27.045455     0.0
  Jan     Sun  7670.090909        0.0  27.090909     0.0
  Jan     Mon  7702.909091        0.0  27.727273     0.0
  Jan     Tue  7734.260870        0.0  27.956522     0.0

df_mean.set_index(['Month', 'Dayname'], inplace=True)

enter image description here

更新df

  • 此操作基于匹配的索引值
  • 它不能同时处理多个列名,您必须获取感兴趣的列并遍历它们
  • 注意,AAA_1E不在df_mean
for col in df.columns:
    if col in df_mean.columns:
        df[col].fillna(df_mean[col], inplace=True)

enter image description here

相关问题 更多 >