如何将嵌套的JSON解析为CSV

2条回答

网友

1楼 · 编辑于 2024-10-03 17:21:28

一般问题

这个任务（以及其他类似任务）的问题不仅仅是如何创建一个算法——我相信，从理论上讲，您将能够通过（不是那么）大量的嵌套for循环来解决这个问题。问题是如何以一种你不会感到头痛的方式组织代码——即，你可以很容易地修复bug，你可以编写单元测试，你可以很容易地从阅读中理解代码（从现在起的六个月内），并且你可以很容易地在需要时更改代码。我不知道有谁不犯错误时，他们的头围绕着一个深嵌套的结构。在一个代码中寻找错误，因为它反映了数据的嵌套结构，所以代码嵌套得很重，这可能会让人非常沮丧。你知道吗

快速（最有可能：最佳）解决方案

依赖为您的具体用例而设计的包，例如

https://github.com/cwacek/python-jsonschema-objects

如果您有API模式的正式定义，您可以使用包来实现。例如，如果您的API有一个Swagger模式定义，那么您可以使用swagger-py（https://github.com/digium/swagger-py）将JSON响应转换为Python对象。你知道吗

主要解决方案：面向对象编程和递归

即使您的具体用例可能有一些库，我还是想解释一下如何处理“那种”任务的原则：

组织此类问题代码的一个好方法是使用面向对象编程。利用递归原理，可以更清楚地解决嵌套问题。这也使得在API响应的JSON模式由于任何原因（例如API的更新）发生变化时，更容易对代码进行检查。在您的情况下，我建议您创建如下内容：

class JsonObject:
    """Parent Class for any Object that will be retrieved from the JSON
    and potentially has nested JsonObjects inside.

    This class takes care of parsing the json into python Objects and deals
    with the recursion into the nested structures."""

    primitives = []
    json_objects = {
        # For each class, this dict defines all the "embedded" classes which
        # live directly "under" that class in the nested JSON. It will have the
        # following structure:

        # attribute_name : class

        # In your case the JSON schema does not have any "single" objects
        # in the nesting strcuture, but only lists of nested objects. I
        # still , to demonstrate how you would do it in case, there would be
        # single "embedded"
    }
    json_object_lists = {
        # For each class, this dict defines all the "embedded" subclasses which
        # are provided in a list "under" that class in the nested JSON.
        # It will have the following structure:

        # attribute_name : class
    }

    @classmethod
    def from_dict(cls, d: dict) -> "JsonObject":
        instance = cls()

        for attribute in cls.primitives:
            # Here we just parse all the primitives
            instance.attribute = getattr(d, attribute, None)

        for attribute, klass in cls.json_object_lists.items():
            # Here we parse all lists of embedded JSON Objects
            nested_objects = []
            l = getattr(d, attribute, [])
            for nested_dict in l:
                nested_objects += klass.from_dict(nested_dict)

            setattr(instance, attribute, nested_objects)

        for attribute, klass in cls.json_objects.items():
            # Here we parse all "single" embedded JSON Objects
            setattr(
                instance,
                attribute,
                klass.from_dict(getattr(d, attribute, None)
            )

    def to_csv(self) -> str:
        pass

由于您没有解释如何准确地从JSON创建csv，所以我没有实现该方法，而是将此留给您。也没有必要解释整个方法。你知道吗

现在我们有了通用的父类，所有特定的父类都将从中继承，这样我们就可以将递归应用到我们的问题中。现在我们只需要根据要解析的JSON模式定义这些具体的结构。我从您的示例中得到以下内容，但您可以轻松地更改所需的内容：

class Address(JsonObject):
    primitives = [
        "AddressID",
        "AddressTypeID",
        "Address",
        "Street",
        "City",
        "State",
        "PostalCode",
        "IsPrimary",
        "ParentID",
        "ParentType",
    ]

    json_objects = {}
    json_object_lists = {}


class Lease(JsonObject):
    primitives = [
        "LeaseID",
        "TenantID",
        "UnitID",
        "PropertyID",
        "MoveInDate",
        "SortOrder",
        "CreateDate",
        "UpdateDate",
        "CreateUserID",
        "UpdateUserID",
    ]

    json_objects = {}
    json_object_lists = {}


class UserDefinedValue(JsonObject):
    primitives = [
        "UserDefinedValueID",
        "UserDefinedFieldID",
        "ParentID",
        "Name",
        "Value",
        "UpdateDate",
        "FieldType",
        "UpdateUserID",
        "CreateUserID",
    ]

    json_objects = {}
    json_object_lists = {}


class PhoneNumber(JsonObject):
    primitives = [
        "PhoneNumberID",
        "PhoneNumberTypeID",
        "PhoneNumber",
        "Extension",
        "StrippedPhoneNumber",
        "IsPrimary",
        "ParentID",
        "ParentType",
    ]

    json_objects = {}
    json_object_lists = {}

class Contact(JsonObject):
    primitives = [
        "ContactID",
        "FirstName",
        "LastName",
        "MiddleName",
        "IsPrimary",
        "DateOfBirth",
        "FederalTaxID",
        "Comment",
        "Email",
        "License",
        "Vehicle",
        "IsShowOnBill",
        "Employer",
        "ApplicantType",
        "CreateDate",
        "CreateUserID",
        "UpdateDate",
        "AnnualIncome",
        "UpdateUserID",
        "ParentID",
        "ParentType",
    ]

    json_objects = {}
    json_object_lists = {
        "PhoneNumbers": PhoneNumber,
    }


class Tenant(JsonObject):
    primitives = [
        "TenantID",
        "TenantDisplayID",
        "Name",
        "FirstName",
        "LastName",
        "WebMessage",
        "Comment",
        "RentDueDay",
        "RentPeriod",
        "FirstContact",
        "PropertyID",
        "PostingStartDate",
        "CreateDate",
        "CreateUserID",
        "UpdateDate",
        "UpdateUserID",
        "OpenReceivables",  # Maybe this is also a nested Object? Not clear from your sample.
        "Status",
    ]

    json_object_lists = {
        "Contacts": Contact,
        "UserDefinedValues": UserDefinedValue,
        "Leases": Lease,
        "Addresses": Address,
    }

    json_objects = {}

您可以想象这种方法的“美”（至少：顺序），它在于以下几点：通过这种结构，我们可以处理API的JSON响应中的任何嵌套级别，而不会带来额外的麻烦—我们的代码不会加深缩进级别，因为我们已经将讨厌的嵌套分离到了JsonObjectsfrom_json方法的递归定义中。这就是为什么现在识别错误或对代码应用更改要容易得多的原因。你知道吗

要最终将JSON解析到我们的对象中，您可以执行以下操作：

import typing
import json


def tenants_from_json(json_string: str) -> typing.Iterable["Tenant"]:
    tenants = [
        Tenant.from_dict(tenant_dict)
        for tenant_dict in json.loads(json_string)
    ]
    return tenants

重要的最后一点：这只是基本原则

我的代码示例只是对使用对象和递归来处理结构的大量（讨厌的）嵌套的想法的一个非常简短的介绍。代码有一些缺陷。例如，应该避免定义可变的类变量。当然，整个代码应该验证它从API获得的数据。您还可能希望添加每个属性的类型，并在Python对象中正确地表示它们（例如，您的示例包含整数、datetimes和字符串）。你知道吗

我真的只想在这里向大家展示面向对象编程的原理。你知道吗

我没有花时间测试我的代码。所以可能还有虫子。再说一遍，我只是想证明一下这个原则。你知道吗

网友

2楼 · 编辑于 2024-10-03 17:21:28

这样的方法应该有用：

import pandas as pd 
pd.set_option('display.max_rows', 10000)
pd.set_option('display.max_columns', 100000)
TenantID = []
TenantDisplayID = []
Name = []
FirstName = []
LastName = []
WebMessage = []
Comment = []
RentDueDay = []
RentPeriod = []
FirstContact = []
PropertyID = []
PostingStartDate = []
CreateDate = []
CreateUserID = []
UpdateDate = []
UpdateUserID = []
Contacts = []
for rm_access in read_content:
    rm_data = rm_access

    print(rm_data)
    TenantID.append(rm_data["TenantID"])
    TenantDisplayID.append(rm_data["TenantDisplayID"])
    Name.append(rm_data["Name"])
    FirstName.append(rm_data["FirstName"])
    LastName.append(rm_data["LastName"])
    WebMessage.append(rm_data["WebMessage"])
    Comment.append(rm_data["Comment"])
    RentDueDay.append(rm_data["RentDueDay"])
    RentPeriod.append(rm_data["RentPeriod"])
    FirstContact.append(rm_data["FirstContact"])
    PropertyID.append(rm_data["PropertyID"])
    PostingStartDate.append(rm_data["PostingStartDate"])
    CreateDate.append(rm_data["CreateDate"])
    CreateUserID.append(rm_data["CreateUserID"])
    UpdateUserID.append(rm_data["UpdateUserID"])
    Contacts.append(rm_data["Contacts"])


df = pd.DataFrame({"TenantID":TenantID,"TenantDisplayID":TenantDisplayID, "Name": Name,
                   "FirstName":FirstName, "LastName": LastName,"WebMessage": WebMessage,
                   "Comment": Comment, "RentDueDay": RentDueDay, "RentPeriod": RentPeriod,
                   "FirstContact": FirstContact, "PropertyID": PropertyID, "PostingStartDate": PostingStartDate,
                   "CreateDate": CreateDate, "CreateUserID": CreateUserID,"UpdateUserID": UpdateUserID,
                   "Contacts": Contacts})

print(df)

一般问题

快速（最有可能：最佳）解决方案

主要解决方案：面向对象编程和递归

重要的最后一点：这只是基本原则

相关问题更多 >

编程相关推荐

热门问题

热门文章