从嵌套字典中的嵌套字典创建数据帧

2024-09-29 21:47:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我从美国宾夕法尼亚州的一个选举网站上抓取了一个样本,下面是从该网站的json中得到的嵌套字典:

some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
                                                     'ROBINETTE JR',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    },
                                   {'CandidateName': 'TRUMP, DONALD J. ',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    }],
                         'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
                                                         'ROBINETTE JR',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       },
                                       {'CandidateName': 'TRUMP, DONALD '
                                                         'J. ',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       }]}]}}

我不知道如何将其转换为如下所示的数据帧:

enter image description here


Tags: 网站donaldjosephjrtrumpadamsalleghenybiden
2条回答

解决方案

您可以使用以下两种方法中的任意一种进行此操作:

  • 方法1:使用pd.read_json()
  • 方法2:使用pd.DataFrame().DataFrame()方法接受
    • asingle dict

      键是列名,值是列值

    • alist of dicts

      每个列表项都是数据帧的一行,以dict表示:键是该特定行的列名和列值

代码

这里,我们使用list of dicts方法来创建数据帧。首先,我们使用自定义函数prepare_records()将数据转换为list of dicts,然后应用以下两种方法之一

# prepare records
records = prepare_records(data)

# Method-1: using read_json()
import json
df = pd.read_json(json.dumps(records), orient='records')

# Method-2: using DataFrame()
df = pd.DataFrame(data=records)

输出

# print(df.to_markdown(index=False))

| CandidateName              | CountyName   |   ElectionDayNoVotes |   ElectionDayVotes |   ElectionDayYesVotes |   ElectionYear |
|:             -|:      -|          -:|         -:|           :|       -:|
| BIDEN, JOSEPH ROBINETTE JR | ADAMS        |                    0 |                  1 |                     0 |           2020 |
| TRUMP, DONALD J.           | ADAMS        |                    0 |                  1 |                     0 |           2020 |
| BIDEN, JOSEPH ROBINETTE JR | ALLEGHENY    |                    0 |                  1 |                     0 |           2020 |
| TRUMP, DONALD J.           | ALLEGHENY    |                    0 |                  1 |                     0 |           2020 |

自定义函数

# custom function
def prepare_records(data):
    records = []
    for county in data['Election']['Statewide'][0].values(): 
        records.extend(county) # same as: records += county
    return records

虚拟数据

data = {
    'Election': 
        {'Statewide': [
            {
                'ADAMS': [
                    {
                        'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
                        'CountyName': 'ADAMS',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                    {
                        'CandidateName': 'TRUMP, DONALD J.',
                        'CountyName': 'ADAMS',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                ],
                'ALLEGHENY': [
                    {
                        'CandidateName': 'BIDEN, JOSEPH ROBINETTE JR',
                        'CountyName': 'ALLEGHENY',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                    {
                        'CandidateName': 'TRUMP, DONALD J.',
                        'CountyName': 'ALLEGHENY',
                        'ElectionDayNoVotes': '0',
                        'ElectionDayVotes': '1',
                        'ElectionDayYesVotes': '0',
                        'ElectionYear': '2020'
                    },
                ],
            },
        ],
    }
}
import pandas as pd

some_dict = {'Election': {'Statewide': [{'ADAMS': [{'CandidateName': 'BIDEN, JOSEPH '
                                                     'ROBINETTE JR',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    },
                                   {'CandidateName': 'TRUMP, DONALD J. ',
                                    'CountyName': 'ADAMS',
                                    'ElectionDayNoVotes': '0',
                                    'ElectionDayVotes': '1',
                                    'ElectionDayYesVotes': '0',
                                    'ElectionYear': '2020'
                                    }],
                         'ALLEGHENY': [{'CandidateName': 'BIDEN, JOSEPH '
                                                         'ROBINETTE JR',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       },
                                       {'CandidateName': 'TRUMP, DONALD '
                                                         'J. ',
                                        'CountyName': 'ALLEGHENY',
                                        'ElectionDayNoVotes': '0',
                                        'ElectionDayVotes': '1',
                                        'ElectionDayYesVotes': '0',
                                        'ElectionYear': '2020'
                                       }]}]}}


df = pd.DataFrame()
for d in some_dict['Election']['Statewide']:
    for k,v in d.items():
        t = pd.DataFrame(v)
        t['CountyName'] = k
        df = pd.concat([df,t])

相关问题 更多 >

    热门问题