在JSON上迭代时,一半的记录会丢失

2024-10-01 17:27:58 发布

您现在位置:Python中文网/ 问答频道 /正文

[{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null},{"attributes": {"type": "Silo__c", "url": "/services/data/v38.0/sobjects/Silo__c/b0L36000007xRItEAM"}, "Id": "a0M36000007xRItEAM", "OwnerId": "00536000002yKlTAAU", "IsDeleted": false, "Name": "Fresh", "Landing_Stop_Date__c": null, "Service_Exit_Date__c": null}]

上面的内容与我从simple salesforce查询得到的JSON非常相似

下面的代码应该将其转换为jsonl,同时还修复了datetime问题

问题是我必须去掉attributes部分,因为它没有被使用。下面的代码是最新的尝试,但所有的结果都是相同的记录一遍又一遍(上面是重复的数据,所以我希望你把它看一遍也一样)

for element in data :

        item = data.pop()
        item.pop('attributes', None)

        tempdict = OrderedDict({})
        for k,v in item.items() :
            if 'date' in k.lower() or 'stamp' in k.lower() :
                if not v is None :
                    d = d_parse(v)
                    v = d.strftime('%Y-%m-%d %I:%M:%S')
                    tempdict[k.lower()] = v
            else :
                tempdict[k.lower()] = v

        with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
            outfile.write(json.dumps(tempdict))
            outfile.write('\n')

问题是由于某种原因,1/2的记录丢失了。我只把767条记录中的384条存入档案。我怀疑这个问题与pop以及它在代码中发生的位置有关。如何在不丢失pop中1/2记录的情况下删除attributes部分

编辑:

以下代码抛出错误(基于注释):

for element in data :
data.pop('attributes', None)

tempdict = OrderedDict({})
for k,v in data.items() :
    if 'date' in k.lower() or 'stamp' in k.lower() :
        if not v is None :
            d = d_parse(v)
            v = d.strftime('%Y-%m-%d %I:%M:%S')
            tempdict[k.lower()] = v
    else :
        tempdict[k.lower()] = v

with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
    outfile.write(json.dumps(tempdict))
    outfile.write('\n')


Traceback (most recent call last):
  File "child_sfdc_etl.py", line 417, in <module>
    sfToS3(fileCount, sf, nextObj)
  File "child_sfdc_etl.py", line 206, in sfToS3
    send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
  File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
    data.pop('attributes', None)
TypeError: pop() takes at most 1 argument (2 given)

不带None的代码也会引发错误:

for element in data :
data.pop('attributes')

tempdict = OrderedDict({})
for k,v in data.items() :
    if 'date' in k.lower() or 'stamp' in k.lower() :
        if not v is None :
            d = d_parse(v)
            v = d.strftime('%Y-%m-%d %I:%M:%S')
            tempdict[k.lower()] = v
    else :
        tempdict[k.lower()] = v

with open(localFilePath+fileName.format(nextObj,fileCount), 'a') as outfile :
    outfile.write(json.dumps(tempdict))
    outfile.write('\n')

Traceback (most recent call last):
  File "child_sfdc_etl.py", line 417, in <module>
    sfToS3(fileCount, sf, nextObj)
  File "child_sfdc_etl.py", line 206, in sfToS3
    send_temp_jsonl_to_s3(data, nextObj, s3, s3Destination, fileCount, s3Path)
  File "child_sfdc_etl.py", line 254, in send_temp_jsonl_to_s3
    data.pop('attributes')
TypeError: 'str' object cannot be interpreted as an integer

Tags: innonefordatadateifpoplower
1条回答
网友
1楼 · 发布于 2024-10-01 17:27:58

这与如何在Python中实现迭代有关。正如其他人所指出的,罪魁祸首是

for element in data :
    item = data.pop()
    <...>

Python中的Sequence iterator保留当前元素的索引,以确定下一步返回什么(在一般情况下,如果序列在过程中被更改,则不可能对其进行正确的迭代,因此这不是一个bug)

将一个项目(从列表的开头开始)作为element。然后将列表中的the last item作为item删除,并完全忽略element。下一次迭代element将是上一次element之后的项。等等。因此,您将只按相反的顺序处理初始列表的后半部分


删除data.pop()并使用element

相关问题 更多 >

    热门问题