绘制python groupby中聚合列的值时出错

2024-10-06 10:26:38 发布

您现在位置:Python中文网/ 问答频道 /正文

当我绘制从聚合创建的数据帧时,总是会出错

datelisting = {
'FirstClusterCommittedDockDate_grouper':['2019-11','2021-01','2021-04','2021-01','2020-12','2021-02','2020-12','2020-12','2021-03','2020-12','2021-09','2021-09','2020-11','2021-09','2021-11','2021-08'],
'FirstClusterCommittedHandoffDate_grouper':['2020-03','2021-01','2021-06','2021-03','2021-02','2021-04','2021-02','2021-02','2021-10','2021-02','2021-10','2021-11','2020-12','2021-11','2022-01','2022-01'],
'FirstClusterCommittedLiveDate_grouper':['2020-03','2021-03','2021-06','2021-03','2021-03','2021-07','2021-03','2021-03','2021-08','2021-05','2021-12','2021-11','2020-12','2022-05','2022-01','2022-01'],
'TargetPPAPreparationStartDate_grouper':['2019-09','2020-03','2020-07','2020-06','2020-06','2020-06','2020-08','2020-08','2020-06','2020-08','2021-02','2021-02','2020-10','2020-10','2021-02','2021-01'],
'ProjectedDateLive_grouper':['2019-09','2020-03','2020-07','2020-06','2020-06','2020-06','2020-08','2020-08','2020-06','2020-08','2021-02','2021-02','2020-10','2020-10','2021-02','2021-01']
}


datesDf = pd.DataFrame(datelisting).melt().dropna().rename(columns={'variable':'DateFields','value':'DateValue'}).reset_index().drop('index',axis=1)

dfChart = datesDf.groupby(['DateFields',  'DateValue']).agg({'DateValue': ['count']}).reset_index().dropna().rename(columns = { 'count':'ItemCnt'})


dfChart.columns = ["_".join(x) for x in dfChart.columns.ravel()]

print(dfChart )

它产生了这个数据帧

DateFields_                     |DateValue_             |DateValue_ItemCnt
 ---------------------------------------------  | ----------------------------- | ----------------
FirstClusterCommittedDockDate_grouper       |2019-11                    |1
FirstClusterCommittedDockDate_grouper       |2020-11                    |1
FirstClusterCommittedDockDate_grouper       |2020-12                    |4
FirstClusterCommittedDockDate_grouper       |2021-01                    |2
FirstClusterCommittedDockDate_grouper       |2021-02                    |1
FirstClusterCommittedDockDate_grouper       |2021-03                    |1
FirstClusterCommittedDockDate_grouper       |2021-04                    |1
FirstClusterCommittedDockDate_grouper       |2021-08                    |1
FirstClusterCommittedDockDate_grouper       |2021-09                    |3
FirstClusterCommittedDockDate_grouper       |2021-11                    |1
FirstClusterCommittedHandoffDate_grouper        |2020-03                    |1
FirstClusterCommittedHandoffDate_grouper        |2020-12                    |1
FirstClusterCommittedHandoffDate_grouper        |2021-01                    |1
FirstClusterCommittedHandoffDate_grouper        |2021-02                    |4
FirstClusterCommittedHandoffDate_grouper        |2021-03                    |1
FirstClusterCommittedHandoffDate_grouper        |2021-04                    |1
FirstClusterCommittedHandoffDate_grouper        |2021-06                    |1
FirstClusterCommittedHandoffDate_grouper        |2021-10                    |2
FirstClusterCommittedHandoffDate_grouper        |2021-11                    |2
FirstClusterCommittedHandoffDate_grouper        |2022-01                    |2
FirstClusterCommittedLiveDate_grouper       |2020-03                    |1
FirstClusterCommittedLiveDate_grouper       |2020-12                    |1
FirstClusterCommittedLiveDate_grouper       |2021-03                    |5
FirstClusterCommittedLiveDate_grouper       |2021-05                    |1
FirstClusterCommittedLiveDate_grouper       |2021-06                    |1
FirstClusterCommittedLiveDate_grouper       |2021-07                    |1
FirstClusterCommittedLiveDate_grouper       |2021-08                    |1
FirstClusterCommittedLiveDate_grouper       |2021-11                    |1
FirstClusterCommittedLiveDate_grouper       |2021-12                    |1
FirstClusterCommittedLiveDate_grouper       |2022-01                    |2
FirstClusterCommittedLiveDate_grouper       |2022-05                    |1
ProjectedDateLive_grouper               |2019-09                    |1
ProjectedDateLive_grouper               |2020-03                    |1
ProjectedDateLive_grouper               |2020-06                    |4
ProjectedDateLive_grouper               |2020-07                    |1
ProjectedDateLive_grouper               |2020-08                    |3
ProjectedDateLive_grouper               |2020-10                    |2
ProjectedDateLive_grouper               |2021-01                    |1
ProjectedDateLive_grouper               |2021-02                    |3
TargetPPAPreparationStartDate_grouper       |2019-09                    |1
TargetPPAPreparationStartDate_grouper       |2020-03                    |1
TargetPPAPreparationStartDate_grouper       |2020-06                    |4
TargetPPAPreparationStartDate_grouper       |2020-07                    |1
TargetPPAPreparationStartDate_grouper       |2020-08                    |3
TargetPPAPreparationStartDate_grouper       |2020-10                    |2
TargetPPAPreparationStartDate_grouper       |2021-01                    |1
TargetPPAPreparationStartDate_grouper       |2021-02                    |3

当我尝试绘图时,会出现奇怪的错误

base = alt.Chart(dfChart).properties(width=600)

line = base.mark_line().encode(
    x='DateValue_',
    y=' DateValue_ItemCnt',
    color='DateFields_'
)

rule = base.mark_rule().encode(
    y='average(DateValue_ItemCnt)',
    color='DateFields_',
    size=alt.value(2)
)

line + rule

我得到了这个错误

ValueError:  DateValue_ItemCnt encoding field is specified without a type; the type cannot be inferred because it does not match any column in the data.

alt.LayerChart(...)

如果我去掉这行代码

dfChart.columns = ["_".join(x) for x in dfChart.columns.ravel()]

然后我得到了这个错误

ValueError: Dataframe contains invalid column name: ('DateFields', ''). Column names must be strings

聚合操作是否导致ItemCnt列在绘图中不可用?这有什么办法吗


Tags: columnsinbaseindex错误grouperdatevaluedatefields
1条回答
网友
1楼 · 发布于 2024-10-06 10:26:38

您在y=' DateValue_ItemCnt',中有多余的空间,请将其删除,代码如下所示

base = alt.Chart(dfChart).properties(width=600)

line = base.mark_line().encode(
    x='DateValue_',
    y='DateValue_ItemCnt', # you have extra space in this y=' DateValue_ItemCnt', remove it
    color='DateFields_'
)

rule = base.mark_rule().encode(
    y='average(DateValue_ItemCnt)',
    color='DateFields_',
    size=alt.value(2)
)
line + rule

而产出将是: enter image description here

注意:添加了代码的其余部分

import pandas as pd
datelisting = {
'FirstClusterCommittedDockDate_grouper':['2019-11','2021-01','2021-04','2021-01','2020-12','2021-02','2020-12','2020-12','2021-03','2020-12','2021-09','2021-09','2020-11','2021-09','2021-11','2021-08'],
'FirstClusterCommittedHandoffDate_grouper':['2020-03','2021-01','2021-06','2021-03','2021-02','2021-04','2021-02','2021-02','2021-10','2021-02','2021-10','2021-11','2020-12','2021-11','2022-01','2022-01'],
'FirstClusterCommittedLiveDate_grouper':['2020-03','2021-03','2021-06','2021-03','2021-03','2021-07','2021-03','2021-03','2021-08','2021-05','2021-12','2021-11','2020-12','2022-05','2022-01','2022-01'],
'TargetPPAPreparationStartDate_grouper':['2019-09','2020-03','2020-07','2020-06','2020-06','2020-06','2020-08','2020-08','2020-06','2020-08','2021-02','2021-02','2020-10','2020-10','2021-02','2021-01'],
'ProjectedDateLive_grouper':['2019-09','2020-03','2020-07','2020-06','2020-06','2020-06','2020-08','2020-08','2020-06','2020-08','2021-02','2021-02','2020-10','2020-10','2021-02','2021-01']
}


datesDf = pd.DataFrame(datelisting).melt().dropna().rename(columns={'variable':'DateFields','value':'DateValue'}).reset_index().drop('index',axis=1)

dfChart = datesDf.groupby(['DateFields',  'DateValue']).agg({'DateValue': ['count']}).reset_index().dropna().rename(columns = { 'count':'ItemCnt'})


dfChart.columns = ["_".join(x) for x in dfChart.columns.ravel()]

我在altair 4.1.0版上运行了这个。你可以通过电话找到这个

import altair as alt
alt.__version__

相关问题 更多 >