在pandas中重塑数据透视表

2024-09-27 09:34:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要重塑csv数据透视表。一个小的摘录看起来像:

      country          location  confirmedcases_10-02-2020  deaths_10-02-2020  confirmedcases_11-02-2020  deaths_11-02-2020
0   Australia   New South Wales                        4.0                0.0                          4                0.0
1   Australia          Victoria                        4.0                0.0                          4                0.0
2   Australia        Queensland                        5.0                0.0                          5                0.0
3   Australia   South Australia                        2.0                0.0                          2                0.0
4    Cambodia     Sihanoukville                        1.0                0.0                          1                0.0
5      Canada           Ontario                        3.0                0.0                          3                0.0
6      Canada  British Columbia                        4.0                0.0                          4                0.0
7       China             Hubei                    31728.0              974.0                      33366             1068.0
8       China          Zhejiang                     1177.0                0.0                       1131                0.0
9       China         Guangdong                     1177.0                1.0                       1219                1.0
10      China             Henan                     1105.0                7.0                       1135                8.0
11      China             Hunan                      912.0                1.0                        946                2.0
12      China             Anhui                      860.0                4.0                        889                4.0
13      China           Jiangxi                      804.0                1.0                        844                1.0
14      China         Chongqing                      486.0                2.0                        505                3.0
15      China           Sichuan                      417.0                1.0                        436                1.0
16      China          Shandong                      486.0                1.0                        497                1.0
17      China           Jiangsu                      515.0                0.0                        543                0.0
18      China          Shanghai                      302.0                1.0                        311                1.0
19      China           Beijing                      342.0                3.0                        352                3.0

是否有任何ready to use工具来实现它

变成类似于:

      country          location        date  confirmedcases  deaths
0   Australia   New South Wales  2020-02-10             4.0     0.0
1   Australia          Victoria  2020-02-10             4.0     0.0
2   Australia        Queensland  2020-02-10             5.0     0.0
3   Australia   South Australia  2020-02-10             2.0     0.0
4    Cambodia     Sihanoukville  2020-02-10             1.0     0.0
5      Canada           Ontario  2020-02-10             3.0     0.0
6      Canada  British Columbia  2020-02-10             4.0     0.0
7       China             Hubei  2020-02-10         31728.0   974.0
8       China          Zhejiang  2020-02-10          1177.0     0.0
9       China         Guangdong  2020-02-10          1177.0     1.0
10      China             Henan  2020-02-10          1105.0     7.0
11      China             Hunan  2020-02-10           912.0     1.0
12      China             Anhui  2020-02-10           860.0     4.0
13      China           Jiangxi  2020-02-10           804.0     1.0
14      China         Chongqing  2020-02-10           486.0     2.0
15      China           Sichuan  2020-02-10           417.0     1.0
16      China          Shandong  2020-02-10           486.0     1.0
17      China           Jiangsu  2020-02-10           515.0     0.0
18      China          Shanghai  2020-02-10           302.0     1.0
19      China           Beijing  2020-02-10           342.0     3.0
20  Australia   New South Wales  2020-02-11             4.0     0.0
21  Australia          Victoria  2020-02-11             4.0     0.0
22  Australia        Queensland  2020-02-11             5.0     0.0
23  Australia   South Australia  2020-02-11             2.0     0.0
24   Cambodia     Sihanoukville  2020-02-11             1.0     0.0
25     Canada           Ontario  2020-02-11             3.0     0.0
26     Canada  British Columbia  2020-02-11             4.0     0.0
27      China             Hubei  2020-02-11         33366.0  1068.0
28      China          Zhejiang  2020-02-11          1131.0     0.0
29      China         Guangdong  2020-02-11          1219.0     1.0
30      China             Henan  2020-02-11          1135.0     8.0
31      China             Hunan  2020-02-11           946.0     2.0
32      China             Anhui  2020-02-11           889.0     4.0
33      China           Jiangxi  2020-02-11           844.0     1.0
34      China         Chongqing  2020-02-11           505.0     3.0
35      China           Sichuan  2020-02-11           436.0     1.0
36      China          Shandong  2020-02-11           497.0     1.0
37      China           Jiangsu  2020-02-11           543.0     0.0
38      China          Shanghai  2020-02-11           311.0     1.0
39      China           Beijing  2020-02-11           352.0     3.0

Tags: newsouthchinacanadaaustraliabritishcambodiadeaths
3条回答

是的,你可以通过reshaping the dataframe来实现它

首先,必须熔化列才能将其作为值:

df = df.melt(['country', 'location'],
             [ p for p in df.columns if p not in ['country', 'location'] ], 
             'key',
             'value')

#>       country         location                        key  value
#> 0   Australia  New South Wales  confirmedcases_10-02-2020      4
#> 1   Australia         Victoria  confirmedcases_10-02-2020      4
#> 2   Australia       Queensland  confirmedcases_10-02-2020      5
#> 3   Australia  South Australia  confirmedcases_10-02-2020      2
#> 4    Cambodia    Sihanoukville  confirmedcases_10-02-2020      1
#> ..        ...              ...                        ...    ...
#> 75      China          Sichuan          deaths_11-02-2020      1
#> 76      China         Shandong          deaths_11-02-2020      1
#> 77      China          Jiangsu          deaths_11-02-2020      0
#> 78      China         Shanghai          deaths_11-02-2020      1
#> 79      China          Beijing          deaths_11-02-2020      3

之后,需要分隔列key中的值:

key_split_series = df.key.str.split("_", expand=True)
df["key"] = key_split_series[0]
df["date"] = key_split_series[1]

#>       country         location             key  value        date
#> 0   Australia  New South Wales  confirmedcases      4  10-02-2020
#> 1   Australia         Victoria  confirmedcases      4  10-02-2020
#> 2   Australia       Queensland  confirmedcases      5  10-02-2020
#> 3   Australia  South Australia  confirmedcases      2  10-02-2020
#> 4    Cambodia    Sihanoukville  confirmedcases      1  10-02-2020
#> ..        ...              ...             ...    ...         ...
#> 75      China          Sichuan          deaths      1  11-02-2020
#> 76      China         Shandong          deaths      1  11-02-2020
#> 77      China          Jiangsu          deaths      0  11-02-2020
#> 78      China         Shanghai          deaths      1  11-02-2020
#> 79      China          Beijing          deaths      3  11-02-2020

最后,您只需要对表进行透视,使confirmedcasesdeaths返回为列:

df = df.set_index(["country", "location", "date", "key"])["value"].unstack().reset_index()

#> key    country         location        date  confirmedcases  deaths
#> 0    Australia  New South Wales  10-02-2020               4       0
#> 1    Australia  New South Wales  11-02-2020               4       0
#> 2    Australia       Queensland  10-02-2020               5       0
#> 3    Australia       Queensland  11-02-2020               5       0
#> 4    Australia  South Australia  10-02-2020               2       0
#> ..         ...              ...         ...             ...     ...
#> 35       China         Shanghai  11-02-2020             311       1
#> 36       China          Sichuan  10-02-2020             417       1
#> 37       China          Sichuan  11-02-2020             436       1
#> 38       China         Zhejiang  10-02-2020            1177       0
#> 39       China         Zhejiang  11-02-2020            1131       0

使用pd.wide_to_long

print (pd.wide_to_long(df,stubnames=["confirmedcases","deaths"],
                       i=["country","location"],j="date",sep="_",
                       suffix=r'\d{2}-\d{2}-\d{4}').reset_index())

      country          location        date  confirmedcases  deaths
0   Australia   New South Wales  10-02-2020             4.0     0.0
1   Australia   New South Wales  11-02-2020             4.0     0.0
2   Australia          Victoria  10-02-2020             4.0     0.0
3   Australia          Victoria  11-02-2020             4.0     0.0
4   Australia        Queensland  10-02-2020             5.0     0.0
5   Australia        Queensland  11-02-2020             5.0     0.0
6   Australia   South Australia  10-02-2020             2.0     0.0
7   Australia   South Australia  11-02-2020             2.0     0.0
8    Cambodia     Sihanoukville  10-02-2020             1.0     0.0
9    Cambodia     Sihanoukville  11-02-2020             1.0     0.0
10     Canada           Ontario  10-02-2020             3.0     0.0
11     Canada           Ontario  11-02-2020             3.0     0.0
12     Canada  British Columbia  10-02-2020             4.0     0.0
13     Canada  British Columbia  11-02-2020             4.0     0.0
14      China             Hubei  10-02-2020         31728.0   974.0
15      China             Hubei  11-02-2020         33366.0  1068.0
16      China          Zhejiang  10-02-2020          1177.0     0.0
17      China          Zhejiang  11-02-2020          1131.0     0.0
18      China         Guangdong  10-02-2020          1177.0     1.0
19      China         Guangdong  11-02-2020          1219.0     1.0
20      China             Henan  10-02-2020          1105.0     7.0
21      China             Henan  11-02-2020          1135.0     8.0
22      China             Hunan  10-02-2020           912.0     1.0
23      China             Hunan  11-02-2020           946.0     2.0
24      China             Anhui  10-02-2020           860.0     4.0
25      China             Anhui  11-02-2020           889.0     4.0
26      China           Jiangxi  10-02-2020           804.0     1.0
27      China           Jiangxi  11-02-2020           844.0     1.0
28      China         Chongqing  10-02-2020           486.0     2.0
29      China         Chongqing  11-02-2020           505.0     3.0
30      China           Sichuan  10-02-2020           417.0     1.0
31      China           Sichuan  11-02-2020           436.0     1.0
32      China          Shandong  10-02-2020           486.0     1.0
33      China          Shandong  11-02-2020           497.0     1.0
34      China           Jiangsu  10-02-2020           515.0     0.0
35      China           Jiangsu  11-02-2020           543.0     0.0
36      China          Shanghai  10-02-2020           302.0     1.0
37      China          Shanghai  11-02-2020           311.0     1.0
38      China           Beijing  10-02-2020           342.0     3.0
39      China           Beijing  11-02-2020           352.0     3.0

如果只有一个特征,则使用{dataframe}.restrape(-1,1));如果只有一个样本,则使用{dataframe}.restrape((1,-1))

相关问题 更多 >

    热门问题