我的Postgres查询如何执行得更快？我可以使用Python来提供更快的迭代吗？

with dupe as ( select json_document->'Firstname'->0->'Content' as first_name, json_document->'Lastname'->0->'Content' as last_name, identifiers->'RecordID' as record_id from ( select *, jsonb_array_elements(json_document->'Identifiers') as identifiers from staging ) sub group by record_id, json_document order by last_name ) select * from dupe da where ( select count(*) from dupe db where db.record_id = da.record_id ) > 1;

{ "Firstname": "Bobb", "Lastname": "Smith", "Identifiers": [ { "Content": "123", "RecordID": "123", "SystemID": "Test", "LastUpdated": "2017-09-12T02:23:30.817Z" }, { "Content": "abc", "RecordID": "abc", "SystemID": "Test", "LastUpdated": "2017-09-13T10:10:21.598Z" }, { "Content": "def", "RecordID": "def", "SystemID": "Test", "LastUpdated": "2017-09-13T10:10:21.598Z" } ] }

{ "Firstname": "Bob", "Lastname": "Smith", "Identifiers": [ { "Content": "abc", "RecordID": "abc", "SystemID": "Test", "LastUpdated": "2017-09-13T10:10:26.020Z" } ] }

select json_document->'Firstname'->0->'Content' as first_name, json_document->'Lastname'->0->'Content' as last_name, identifiers->'RecordID' as record_id from ( select *, jsonb_array_elements(json_document->'Identifiers') as identifiers from staging ) sub order by last_name;

2条回答

网友

1楼 · 编辑于 2024-10-01 00:18:33

考虑读取postgresjson列类型的原始、未查询的值，并使用pandas^{}绑定到平面数据帧中。从那里使用熊猫^{}。你知道吗

为了演示，下面为每个对应的标识符记录将一个json数据解析为三行数据帧：

import json
import pandas as pd

json_str = '''
{
        "Firstname": "Bobb",
        "Lastname": "Smith",
        "Identifiers": [
            {
                "Content": "123",
                "RecordID": "123",
                "SystemID": "Test",
                "LastUpdated": "2017-09-12T02:23:30.817Z"
            },
            {
                "Content": "abc",
                "RecordID": "abc",
                "SystemID": "Test",
                "LastUpdated": "2017-09-13T10:10:21.598Z"
            },
            {
                "Content": "def",
                "RecordID": "def",
                "SystemID": "Test",
                "LastUpdated": "2017-09-13T10:10:21.598Z"
            }
        ]
}
'''

data = json.loads(json_str)    
df = pd.io.json.json_normalize(data, 'Identifiers', ['Firstname','Lastname'])

print(df)    
#   Content               LastUpdated RecordID SystemID Lastname Firstname
# 0     123  2017-09-12T02:23:30.817Z      123     Test    Smith      Bobb
# 1     abc  2017-09-13T10:10:21.598Z      abc     Test    Smith      Bobb
# 2     def  2017-09-13T10:10:21.598Z      def     Test    Smith      Bobb

对于您的数据库，请考虑连接DB-API，例如psycopg2或sqlAlchemy，并相应地将每个json解析为一个字符串。诚然，可能还有其他方法来处理json，如psycopg2 docs中所示，但下面将以文本形式接收数据并在python端进行解析：

import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")

cur = conn.cursor()    
cur.execute("SELECT json_document::text FROM staging;")

df = pd.io.json.json_normalize([json.loads(row[0]) for row in cur.fetchall()], 
                               'Identifiers', ['Firstname','Lastname'])

df = df.drop_duplicates(['RecordID'])

cur.close()
conn.close()

网友

2楼 · 编辑于 2024-10-01 00:18:33

请尝试以下操作，这样可以消除count（*），而使用exists。你知道吗

 with dupe as ( 
   select id, 
     json_document->'Firstname'->0->'Content' as first_name, 
     json_document->'Lastname'->0->'Content' as last_name, 
     identifiers->'RecordID' as record_id 
   from 
     (select 
       *, 
       jsonb_array_elements(json_document->'Identifiers') as identifiers 
      from staging ) sub 
      group by
        id,
        record_id, 
        json_document 
      order by last_name ) 
 select * from dupe da 
   where exists 
     (select * 
       from dupe db 
       where 
         db.record_id = da.record_id 
         and db.id != da.id
     )

相关问题更多 >

编程相关推荐

热门问题

热门文章