将mongodb数据迁移到postgresql jsonb的python脚本失败,并出现CursorNotFound

2024-10-01 04:53:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我编写了一个简单的脚本,打算将mongodb数据迁移到postgresql jsonb。附件是简化的脚本

import json
import psycopg2
from pymongo import MongoClient
from decimal import Decimal
from bson.decimal128 import Decimal128
from bson.objectid import ObjectId
from datetime import datetime
import time

class JSONEncoder(json.JSONEncoder):
    def default(self, o):
        if isinstance(o, ObjectId):
            return str(o)
        if isinstance(o, Decimal):
            return float(o)
        if isinstance(o, Decimal128):
            return float(str(o))
        if isinstance(o, datetime):
            return str(o)
        return json.JSONEncoder.default(self, o)

client = MongoClient("mongodb+srv://myserver.mongodb.net/mydb")
db=client.mydb
col=db.mycollection
mdbcur=col.find(no_cursor_timeout=True)
with psycopg2.connect(host="localhost",
    database="pgdb",
    user="postgres", 
    password="mypassword") as pgconn:
    pgcur=pgconn.cursor()

    for doc in mdbcur:
        data=json.dumps(doc, cls=JSONEncoder)
        query_sql = """insert into tbl (data) VALUES ('""" + data + """'::jsonb)"""
        pgcur.execute(query_sql)
        pgconn.commit()
        time.sleep(1)
    pgcur.close()
    mdbcur.close()

脚本运行得很好,但有>;200万条记录,通过某些方式总是因错误而中断:-

CursorNotFound: cursor id 6836832851964097072 not found, full error: {'operationTime': Timestamp(1618918232, 1), 'ok': 0.0, 'errmsg': 'cursor id 6836832851964097072 not found', 'code': 43, 'codeName': 'CursorNotFound', '$clusterTime': {'clusterTime': Timestamp(1618918232, 1), 'signature': {'hash': b'\x08\x0e8\xff\x89\x114s\x9c:\x82z\xb0j\x05\xe9\xacv\x1c4', 'keyId': 6897522357011742722}}}

尝试添加time.sleep,添加no\u cursor\u timeout,它总是在20000条记录之后的某个地方失败(不修复哪个记录),没有数据从mongodb中删除,但总是有新记录

有什么建议吗


Tags: fromimport脚本jsondatetimereturniftime