pd.read_json()到pandas中的数据帧

2024-06-25 22:45:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我以前从未使用过json文件。目前我正在做的就是:

df = pd.read_json("precincts-with-results.geojson.gz")
df['features']

这是上面的结果:

{'type': 'Feature',
 'properties': {'GEOID': '05047-1-A (Oz Wd 1)',
  'votes_dem': 79,
  'votes_rep': 279,
  'votes_total': 366,
  'votes_per_sqkm': 54.2,
  'pct_dem_lead': -54.6},
 'geometry': {'type': 'MultiPolygon',
  'coordinates': [[[[-93.88536364311612, 35.483758439321655],
     [-93.8840470388143, 35.483727092097084],
     [-93.88403177163875, 35.483726728784056],
     [-93.88403177478405, 35.48372661335151],
     [-93.87956152062023, 35.483586322546344],
     [-93.87520339804045, 35.48339873745174],
     [-93.87534656033012, 35.480428139370346],
     [-93.87604589142236, 35.48045051399295], ...

我想要一个如下所示的数据帧:

GEOID                   votes_dem     votes_rep     votes_total      votes_per_sqkm       pct_dem_lead
05047-1-A (Oz Wd 1)       79               279         366             54.2                 -54.6 
  

您可以在此处下载数据集(大小264MB):https://int.nyt.com/newsgraphics/elections/map-data/2020/national/precincts-with-results.geojson.gz

感谢您的帮助和代码


Tags: jsondfgeojsontypewithresultstotalgz
3条回答

这对我有用

import pandas as pd

filename = 'precincts-with-results.geojson.gz'
df = pd.read_json(filename)
features = df['features']
properties = [_['properties'] for _ in features.values]
collect_properties = {_: list() for _ in properties[0].keys()}
for record in properties:
    for col_name, value in record.items():
        collect_properties[col_name].append(value)

new_df = pd.DataFrame.from_dict(collect_properties).set_index('GEOID')
print(new_df)

结果看起来像

                        votes_dem  votes_rep  ...  votes_per_sqkm  pct_dem_lead
GEOID                                          ...                              
05047-1-A (Oz Wd 1)           79.0      279.0  ...            54.2         -54.6
05149-11 - Dutch Creek         6.0       31.0  ...             0.3         -67.6
05081-Franklin Township       53.0      383.0  ...             3.8         -73.3
05027-McNeil City             64.0       68.0  ...            41.9          -2.9
05027-Taylor Township         67.0      342.0  ...             1.7         -65.0
                            ...        ...  ...             ...           ...
56007-01-01                  173.0      300.0  ...            21.8         -26.1
56045-05-01                   70.0      390.0  ...           259.5         -66.7
56045-05-02                   67.0      376.0  ...            66.5         -68.1
56045-05-03                   63.0      419.0  ...           141.4         -71.5
56041-130                    168.0      654.0  ...             1.7         -57.1
[146596 rows x 5 columns]

我已经实现了一段代码,用于展平JSON并提取选定字段-

How to extract fields from nested json and save in a data structure

下面是执行该任务的代码(只需更新字段列表,并用json文件路径替换“test1.json”)-

import json
from pprint import pprint
import pandas as pd

def flatten_json(nested_json, fields):
    out = []
    temp = {}

    def flatten(x, name=''):
        nonlocal temp
        if type(x) is dict:
            temp = {}
            for a in x:
                flatten(x[a], a)
        elif type(x) is list:
            for i, a in enumerate(x):
                flatten(a)
                i += 1
        elif name in fields:
            temp[name] = x
            out.append(temp)
    flatten(nested_json)
    return out

with open('test.json', 'r') as f:
    json1 = json.loads(f.read())

# mention the required fields here.
fields = ['GEOID','votes_dem','votes_rep','votes_total','votes_per_sqkm','pct_dem_lead']

result = (flatten_json(json1, fields))


df = pd.DataFrame(result)
df.drop_duplicates(inplace=True)

您的json文件没有正确的格式,无法由pandas立即转换为数据帧。首先,您需要将json文件中的数据提取到字典中:

with open(os.path.join(sys.path[0], "precincts-with-results.geojson")) as json_file:
    data = json.load(json_file)

然后,您必须转换数据以满足您的需要。因为字典是嵌套的,所以需要规范化数据。现在我不完全确定您到底希望数据帧看起来是什么样子,所以我这里不包括代码,但我认为您可以使用pd.json_normalize来实现这一点。它返回一个数据帧。有关此操作的文档可在此处找到:https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

相关问题 更多 >