pd.read_json（）到pandas中的数据帧

{'type': 'Feature', 'properties': {'GEOID': '05047-1-A (Oz Wd 1)', 'votes_dem': 79, 'votes_rep': 279, 'votes_total': 366, 'votes_per_sqkm': 54.2, 'pct_dem_lead': -54.6}, 'geometry': {'type': 'MultiPolygon', 'coordinates': [[[[-93.88536364311612, 35.483758439321655], [-93.8840470388143, 35.483727092097084], [-93.88403177163875, 35.483726728784056], [-93.88403177478405, 35.48372661335151], [-93.87956152062023, 35.483586322546344], [-93.87520339804045, 35.48339873745174], [-93.87534656033012, 35.480428139370346], [-93.87604589142236, 35.48045051399295], ...

GEOID votes_dem votes_rep votes_total votes_per_sqkm pct_dem_lead 05047-1-A (Oz Wd 1) 79 279 366 54.2 -54.6

3条回答

网友

1楼 · 编辑于 2024-06-25 22:45:45

这对我有用

import pandas as pd

filename = 'precincts-with-results.geojson.gz'
df = pd.read_json(filename)
features = df['features']
properties = [_['properties'] for _ in features.values]
collect_properties = {_: list() for _ in properties[0].keys()}
for record in properties:
    for col_name, value in record.items():
        collect_properties[col_name].append(value)

new_df = pd.DataFrame.from_dict(collect_properties).set_index('GEOID')
print(new_df)

结果看起来像

                        votes_dem  votes_rep  ...  votes_per_sqkm  pct_dem_lead
GEOID                                          ...                              
05047-1-A (Oz Wd 1)           79.0      279.0  ...            54.2         -54.6
05149-11 - Dutch Creek         6.0       31.0  ...             0.3         -67.6
05081-Franklin Township       53.0      383.0  ...             3.8         -73.3
05027-McNeil City             64.0       68.0  ...            41.9          -2.9
05027-Taylor Township         67.0      342.0  ...             1.7         -65.0
                            ...        ...  ...             ...           ...
56007-01-01                  173.0      300.0  ...            21.8         -26.1
56045-05-01                   70.0      390.0  ...           259.5         -66.7
56045-05-02                   67.0      376.0  ...            66.5         -68.1
56045-05-03                   63.0      419.0  ...           141.4         -71.5
56041-130                    168.0      654.0  ...             1.7         -57.1
[146596 rows x 5 columns]

网友

2楼 · 编辑于 2024-06-25 22:45:45

我已经实现了一段代码，用于展平JSON并提取选定字段-

How to extract fields from nested json and save in a data structure

下面是执行该任务的代码（只需更新字段列表，并用json文件路径替换“test1.json”）-

import json
from pprint import pprint
import pandas as pd

def flatten_json(nested_json, fields):
    out = []
    temp = {}

    def flatten(x, name=''):
        nonlocal temp
        if type(x) is dict:
            temp = {}
            for a in x:
                flatten(x[a], a)
        elif type(x) is list:
            for i, a in enumerate(x):
                flatten(a)
                i += 1
        elif name in fields:
            temp[name] = x
            out.append(temp)
    flatten(nested_json)
    return out

with open('test.json', 'r') as f:
    json1 = json.loads(f.read())

# mention the required fields here.
fields = ['GEOID','votes_dem','votes_rep','votes_total','votes_per_sqkm','pct_dem_lead']

result = (flatten_json(json1, fields))


df = pd.DataFrame(result)
df.drop_duplicates(inplace=True)

网友

3楼 · 编辑于 2024-06-25 22:45:45

您的json文件没有正确的格式，无法由pandas立即转换为数据帧。首先，您需要将json文件中的数据提取到字典中：

with open(os.path.join(sys.path[0], "precincts-with-results.geojson")) as json_file:
    data = json.load(json_file)

然后，您必须转换数据以满足您的需要。因为字典是嵌套的，所以需要规范化数据。现在我不完全确定您到底希望数据帧看起来是什么样子，所以我这里不包括代码，但我认为您可以使用pd.json_normalize来实现这一点。它返回一个数据帧。有关此操作的文档可在此处找到：https://pandas.pydata.org/docs/reference/api/pandas.json_normalize.html

相关问题更多 >

编程相关推荐

热门问题

热门文章