优化代码以处理大量的数据

2024-10-02 16:21:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下代码:

import json


data_sample = [{
"name":"John",
"age":30,
"cars":[ {
"temp":{
"sum":"20",
"for":12,
}
,
"id":30,
"element":[ {"model":"Taurus1", "doors":{"id":"1", "id2":101}}, {"model":"T1", "doors":{"id":"2", "id2":12}},  {"model":"As", "doors":{"id":"Mo", "id2":4}} ]
}, {
"temp":{
"sum":"10",
"for":12,
}
,
"id":31,
"element":[ {"model":"Taurus2", "doors":{"id":"2", "id2":102}}, {"model":"T2", "doors":{"id":"5", "id2":12}},  {"model":"Thing", "doors":{"id":"Fo", "id2":4}} ]
}, {
"temp":{
"sum":"20",
"for":10,
}
,
"id":32,
"element":[ {"model":"Taurus3", "doors":{"id":"3", "id2":103}}, {"model":"T3", "doors":{"id":"15", "id2":62}},  {"model":"By", "doors":{"id":"Log", "id2":4}} ]
} ]
}]

def flat_list(z):
    x = []
    for i, data_obj in enumerate(z):
        if type(data_obj) is dict or type(data_obj) is list:
            x.extend([flatten_data(data_obj)])
        else:
            x.extend([data_obj])
    return x


def flatten_data(y):
    out = {}
    def flatten(x, name=''):
            if type(x) is dict:
                for a in x:
                    flatten(x[a], name + a + '_')
            elif type(x) is list:
                out[name[:-1]] = flat_list(x)
            else:
                out[name[:-1]] = x
    flatten(y)
    return out

def generatejson(response2):

    # response 2 is [(first data set), (second data set)]  convert it to dictionary {0: (first data set), 1: (second data set)}
    sample_object = {i: data_response for i, data_response in enumerate(response2)}
    flat = {k: flatten_data(v) for k, v in sample_object.items()}
    return json.dumps(flat, sort_keys=True)

print generatejson(data_sample)

此代码从以下格式获取数据:

[(first data set), (second data set)]

开始寻找嵌套的dict。如果检测到嵌套dict,则代码将其扁平化到父级

例如,代码检测到:

enter image description here

doors是嵌套dict,因此它将其转换为:

enter image description here

请注意,它不会更改列表/数组。他们没有被夷为平地

我的问题:

对于少量的数据,代码工作得很好,但是处理大量的集合(1000+),性能非常低。。。有时甚至崩溃

如何改进和优化此代码的性能

data_sample只包含1个数据集(我假设这足以进行检查)


Tags: sample代码nameidobjfordatamodel