如何在Python中将数据帧转换为矩阵格式?

2024-09-30 08:21:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个pandas数据帧df,如图所示。(df.to_dict()在末尾给出):

model   scenario    module      
AIM/CGE 2.0 ADVANCE_2020_1.5C-2100  wind_total_share    high    high
SSP1-19 wind_total_share    high    high
SSP2-19 wind_total_share    high    high
AIM/CGE 2.1 CD-LINKS_NPi2020_400    wind_total_share    high    high
TERL_15D_LowCarbonTransportPolicy   wind_total_share    medium  medium
TERL_15D_NoTransportPolicy  wind_total_share    medium  medium
GCAM 4.2    SSP1-19 wind_total_share    medium  medium
IMAGE 3.0.1 IMA15-AGInt wind_total_share    low low
IMA15-Def   wind_total_share    low low
IMA15-Eff   wind_total_share    low low

modelscenariomodule是索引,而InfrastructureInvestment是数据帧的两列。我想将这两列转换为矩阵格式,其中基础设施类似于x轴,投资类似于Y轴。此外,在基础设施和投资方面,我需要将它们分为低、中、高三类。 在单元格中,我想知道满足给定变量(投资基础设施)的给定类别(低/中/高)的模型和场景的名称,如下面的屏幕截图所示。也可能存在这样的情况,即基础设施成本较高,但投资成本较低或中等,反之亦然。
enter image description here

我还想有另一个矩阵,它显示了落入给定单元的场景数量。第二个矩阵应如图所示: enter image description here

我不熟悉通过修改现有数据帧来获得这种矩阵格式。是否有任何功能或模块可用于此目的?如何使用Python将数据帧转换为屏幕截图中所示的矩阵格式

df.to_dict()如下所示:

{'Infrastructure': {('AIM/CGE 2.0',
   'ADVANCE_2020_1.5C-2100',
   'wind_total_share'): 'high',
  ('AIM/CGE 2.0', 'SSP1-19', 'wind_total_share'): 'high',
  ('AIM/CGE 2.0', 'SSP2-19', 'wind_total_share'): 'high',
  ('AIM/CGE 2.1', 'CD-LINKS_NPi2020_400', 'wind_total_share'): 'high',
  ('AIM/CGE 2.1',
   'TERL_15D_LowCarbonTransportPolicy',
   'wind_total_share'): 'medium',
  ('AIM/CGE 2.1', 'TERL_15D_NoTransportPolicy', 'wind_total_share'): 'medium',
  ('GCAM 4.2', 'SSP1-19', 'wind_total_share'): 'medium',
  ('IMAGE 3.0.1', 'IMA15-AGInt', 'wind_total_share'): 'low',
  ('IMAGE 3.0.1', 'IMA15-Def', 'wind_total_share'): 'low',
  ('IMAGE 3.0.1', 'IMA15-Eff', 'wind_total_share'): 'low'},
 'Investment': {('AIM/CGE 2.0',
   'ADVANCE_2020_1.5C-2100',
   'wind_total_share'): 'high',
  ('AIM/CGE 2.0', 'SSP1-19', 'wind_total_share'): 'high',
  ('AIM/CGE 2.0', 'SSP2-19', 'wind_total_share'): 'high',
  ('AIM/CGE 2.1', 'CD-LINKS_NPi2020_400', 'wind_total_share'): 'high',
  ('AIM/CGE 2.1',
   'TERL_15D_LowCarbonTransportPolicy',
   'wind_total_share'): 'medium',
  ('AIM/CGE 2.1', 'TERL_15D_NoTransportPolicy', 'wind_total_share'): 'medium',
  ('GCAM 4.2', 'SSP1-19', 'wind_total_share'): 'medium',
  ('IMAGE 3.0.1', 'IMA15-AGInt', 'wind_total_share'): 'low',
  ('IMAGE 3.0.1', 'IMA15-Def', 'wind_total_share'): 'low',
  ('IMAGE 3.0.1', 'IMA15-Eff', 'wind_total_share'): 'low'}}

Tags: 数据imageshare基础设施矩阵lowtotalmedium
2条回答

pivot_table在这方面可以有所帮助

第二个矩阵可通过以下方法直接获得:

df.assign(values=1).pivot_table(values='values', index='Infrastructure',
          columns='Investment', aggfunc=sum)

其中:

Investment      high  low  medium
Infrastructure                   
high             4.0  NaN     NaN
low              NaN  3.0     NaN
medium           NaN  NaN     3.0

对于第一个,可以首先重置索引以生成预期值:

temp = df.reset_index(2, drop=True).reset_index()
temp['value'] = temp['level_0'] + ' ' + temp['level_1']
df2 = temp.pivot_table(values = 'value', index='Infrastructure',
                       columns='Investment', aggfunc=list)

df2中,多个标识符被分组在一个列表中。如果希望每行一个,可以分解数据帧:

df2.explode('high').explode('medium').explode('low')

得到

Investment                                    high                      low                                         medium
Infrastructure                                                                                                            
high            AIM/CGE 2.0 ADVANCE_2020_1.5C-2100                      NaN                                            NaN
high                           AIM/CGE 2.0 SSP1-19                      NaN                                            NaN
high                           AIM/CGE 2.0 SSP2-19                      NaN                                            NaN
high              AIM/CGE 2.1 CD-LINKS_NPi2020_400                      NaN                                            NaN
low                                            NaN  IMAGE 3.0.1 IMA15-AGInt                                            NaN
low                                            NaN    IMAGE 3.0.1 IMA15-Def                                            NaN
low                                            NaN    IMAGE 3.0.1 IMA15-Eff                                            NaN
medium                                         NaN                      NaN  AIM/CGE 2.1 TERL_15D_LowCarbonTransportPolicy
medium                                         NaN                      NaN         AIM/CGE 2.1 TERL_15D_NoTransportPolicy
medium                                         NaN                      NaN                               GCAM 4.2 SSP1-19

我进行了一段时间的头脑风暴,自己想出了解决办法。它很长,但看起来像这样

首先,我列出了基础设施和投资的高、中、低情景:

infra_high = (df[df["Infrastructure"]=="high"].index.get_level_values(0) + " " + df[df["Infrastructure"]=="high"].index.get_level_values(1)).to_list()
infra_medium = (df[df["Infrastructure"]=="medium"].index.get_level_values(0) + " " + df[df["Infrastructure"]=="medium"].index.get_level_values(1)).to_list()
infra_low = (df[df["Infrastructure"]=="low"].index.get_level_values(0) + " " + df[df["Infrastructure"]=="low"].index.get_level_values(1)).to_list()

invest_high = (df[df["Investment"]=="high"].index.get_level_values(0) + " " + df[df["Investment"]=="high"].index.get_level_values(1)).to_list()
invest_medium = (df[df["Investment"]=="medium"].index.get_level_values(0) + " " + df[df["Investment"]=="medium"].index.get_level_values(1)).to_list()
invest_low = (df[df["Investment"]=="low"].index.get_level_values(0) + " " + df[df["Investment"]=="low"].index.get_level_values(1)).to_list()

接下来,我做了高,高场景的交叉;基础设施和投资的高、中、高、低如下:

i1 = list(set(infra_high).intersection(set(invest_high)))
i2 = list(set(infra_high).intersection(set(invest_medium)))
i3 = list(set(infra_high).intersection(set(invest_low)))

i4 = list(set(infra_medium).intersection(set(invest_high)))
i5 = list(set(infra_medium).intersection(set(invest_medium)))
i6 = list(set(infra_medium).intersection(set(invest_low)))

i7 = list(set(infra_low).intersection(set(invest_high)))
i8 = list(set(infra_low).intersection(set(invest_medium)))
i9 = list(set(infra_low).intersection(set(invest_low)))

接下来,使用Excel文件,我重新创建了我想要制作的矩阵: enter image description here

然后,我为数据帧的每个单元格分配了列表,如下所示:

landing_zone1.loc["High","High"] = i1
landing_zone1.loc["High","Medium"] = i2
landing_zone1.loc["High","Low"] = i3

landing_zone1.loc["Medium","High"] = i4
landing_zone1.loc["Medium","Medium"] = i5
landing_zone1.loc["Medium","Low"] = i6

landing_zone1.loc["Low","High"] = i7
landing_zone1.loc["Low","Medium"] = i8
landing_zone1.loc["Low","Low"] = i9

我使用每个列表的len()对场景数矩阵做了同样的处理

相关问题 更多 >

    热门问题