如何使用Python将json文件数据提取到数据库中?

2024-06-29 01:10:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我是使用JSON文件的新手,遇到了this data。你知道吗

我正在尝试将它作为列导入到pandas数据框中,以便进一步使用它。但是,我认为我遇到了一些数据列的嵌套列表,因此我的数据帧如下所示:

enter image description here

这是我使用的数据示例:

{"fraudulent":false,"customer":
        {"customerEmail":"josephhoward@yahoo.com","customerPhone":"400-108-5415",
        "customerDevice":"yyeiaxpltf82440jnb3v","customerIPAddress":"8.129.104.40",
        "customerBillingAddress":"5493 Jones Islands\nBrownside, CA 51896"},
        "orders":[{"orderId":"vjbdvd","orderAmount":18,"orderState":"pending",
        "orderShippingAddress":"5493 Jones Islands\nBrownside, CA 51896"},
        {"orderId":"yp6x27","orderAmount":26,"orderState":"fulfilled",
        "orderShippingAddress":"5493 Jones Islands\nBrownside, CA 51896"}],
        "paymentMethods":[{"paymentMethodId":"wt07xm68b",
        "paymentMethodRegistrationFailure":true,"paymentMethodType":"card",
        "paymentMethodProvider":"JCB 16 digit",
        "paymentMethodIssuer":"Citizens First Banks"}],"transactions":[
        {"transactionId":"a9lcj51r","orderId":"vjbdvd",
        "paymentMethodId":"wt07xm68b","transactionAmount":18,
        "transactionFailed":false},{"transactionId":"y4wcv03i",
        "orderId":"yp6x27","paymentMethodId":"wt07xm68b",
        "transactionAmount":26,"transactionFailed":false}]}

从上图中可以看到,像orders这样的列包含一个特性列表,比如orderAmountorderState等等。我希望这些值被拆分成它们唯一的列,这样我就得到了一个熊猫数据框,其中所有特性都作为单独的列,并带有它们相应的值。你知道吗

到目前为止,我已经尝试使用json_normalize,但这并没有解决我的问题。你知道吗

请帮忙。你知道吗


Tags: 数据false列表caordersjonesorderidislands
1条回答
网友
1楼 · 发布于 2024-06-29 01:10:17

好的,考虑到你所说的,你应该做的第一件事就是解开JSON这样的东西,让它更容易理解:

{
    "fraudulent":false,
    "customer": {
            "customerEmail":"josephhoward@yahoo.com",
            "customerPhone":"400-108-5415",
            "customerDevice":"yyeiaxpltf82440jnb3v",
            "customerIPAddress":"8.129.104.40",
            "customerBillingAddress":"5493 Jones Islands\nBrownside, CA 51896"
        },
    "orders":[
        {
            "orderId":"vjbdvd",
            "orderAmount":18,
            "orderState":"pending",
            "orderShippingAddress":"5493 Jones Islands\nBrownside, CA 51896"
        },
        {
            "orderId":"yp6x27",
            "orderAmount":26,
            "orderState":"fulfilled",
            "orderShippingAddress":"5493 Jones Islands\nBrownside, CA 51896"}
        ],
    "paymentMethods":[
            {
                "paymentMethodId":"wt07xm68b",
                "paymentMethodRegistrationFailure":true,"paymentMethodType":"card",
                "paymentMethodProvider":"JCB 16 digit",
                "paymentMethodIssuer":"Citizens First Banks"
            }
        ],
    "transactions":[
            {
                "transactionId":"a9lcj51r","orderId":"vjbdvd",
                "paymentMethodId":"wt07xm68b","transactionAmount":18,
                "transactionFailed":false
            },
            {
                "transactionId":"y4wcv03i",
                "orderId":"yp6x27","paymentMethodId":"wt07xm68b",
                "transactionAmount":26,"transactionFailed":false
            }
        ]
}

接下来要考虑的是如何构造数据帧。根据数据的外观,您可能需要将其拆分为多个数据帧,因为事务、付款和订单的长度可能是可变的。你知道吗

因此,您可以有一个数据框,其中每一行对应一个客户,每个客户都有一个对应的ID(或者只是使用一些独特的东西,比如他们的电子邮件),然后有一个列对应所有订单,其中一个列包含该ID,该ID将其与客户数据框关联起来。你知道吗

import json
import pandas as pd

customers = []
orders = []
paymentMethods = []
transactions = []

with open('data.json') as f:
    raw_json = json.load(f)

customer_id = 0
for i in raw_json:
    customers.append([
        customer_id,
        i['fraudulent'],
        i['customer']['customerEmail'],
        i['customer']['customerPhone'],
        i['customer']['customerDevice'],
        i['customer']['customerIPAddress'],
        i['customer']['customerBillingAddress'],
    ])

    for j in i['orders']:
        orders.append([
            customer_id,
            j['orderId'],
            j['orderAmount']
            # etc
        ])
    # etc
    customer_id += 1



customers_df = pd.DataFrame(data=customers, columns=['customer_id', 'fraudulent', 'customerEmail', 'customerPhone', 'customerDevice', 'customerIPAddress', 'customerBillingAddress'])

orders_df = pd.DataFrame(data=orders, columns=['customer_id', 'orderId', 'orderAmount'])

您需要扩展这个答案来完全解析JSON,但这应该是一个好的开始。请参阅customer_id用于将customers_df中的条目与orders_df关联的方式。你知道吗

我将查看Pandas文档以获取有关构建数据帧here的更多信息。你知道吗

相关问题 更多 >