<p>考虑读取postgresjson列类型的原始、未查询的值,并使用pandas<a href="https://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.json.json_normalize.html" rel="nofollow noreferrer">^{<cd1>}</a>绑定到平面数据帧中。从那里使用熊猫<a href="http://pandas.pydata.org/pandas-docs/version/0.17.1/generated/pandas.DataFrame.drop_duplicates.html" rel="nofollow noreferrer">^{<cd2>}</a>。你知道吗</p>
<p>为了演示,下面为每个对应的<em>标识符</em>记录将一个json数据解析为三行数据帧:</p>
<pre><code>import json
import pandas as pd
json_str = '''
{
"Firstname": "Bobb",
"Lastname": "Smith",
"Identifiers": [
{
"Content": "123",
"RecordID": "123",
"SystemID": "Test",
"LastUpdated": "2017-09-12T02:23:30.817Z"
},
{
"Content": "abc",
"RecordID": "abc",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:21.598Z"
},
{
"Content": "def",
"RecordID": "def",
"SystemID": "Test",
"LastUpdated": "2017-09-13T10:10:21.598Z"
}
]
}
'''
data = json.loads(json_str)
df = pd.io.json.json_normalize(data, 'Identifiers', ['Firstname','Lastname'])
print(df)
# Content LastUpdated RecordID SystemID Lastname Firstname
# 0 123 2017-09-12T02:23:30.817Z 123 Test Smith Bobb
# 1 abc 2017-09-13T10:10:21.598Z abc Test Smith Bobb
# 2 def 2017-09-13T10:10:21.598Z def Test Smith Bobb
</code></pre>
<hr/>
<p>对于您的数据库,请考虑连接DB-API,例如<code>psycopg2</code>或<a href="http://docs.sqlalchemy.org/en/latest/dialects/postgresql.html" rel="nofollow noreferrer">sqlAlchemy</a>,并相应地将每个json解析为一个字符串。诚然,可能还有其他方法来处理json,如<a href="http://initd.org/psycopg/docs/extras.html#additional-data-types" rel="nofollow noreferrer">psycopg2 docs</a>中所示,但下面将以文本形式接收数据并在python端进行解析:</p>
<pre><code>import psycopg2
conn = psycopg2.connect("dbname=test user=postgres")
cur = conn.cursor()
cur.execute("SELECT json_document::text FROM staging;")
df = pd.io.json.json_normalize([json.loads(row[0]) for row in cur.fetchall()],
'Identifiers', ['Firstname','Lastname'])
df = df.drop_duplicates(['RecordID'])
cur.close()
conn.close()
</code></pre>