<p>您可能需要<code>json</code>来读取数据,<code>csv</code>或其他模块来写入数据,但其余的则不需要任何额外的模块,只需要<code>for</code>循环和dict/list函数</p>
<hr/>
<p>最后你有完整的工作示例</p>
<hr/>
<p>我使用<code>json</code>将JSON字符串转换为Python数据</p>
<pre><code>import json
data = json.loads(text)
</code></pre>
<p>然后我可以使用<code>for</code>-loop来处理数据,并将其转换为行列表,每一行都将是字典:</p>
<pre><code>{
'Id': ...,
'Date': ...,
'Budget': ...,
'Category': ...,
'Country': ...,
'Posted On': ...,
'Hourly Range': ...
}
</code></pre>
<p>我使用<code>for</code>循环分别处理每个<code>messages</code>,并为列表中可能不存在的元素创建具有默认值的行<code>text</code></p>
<pre><code>all_rows = []
for msg in data['messages']:
row = {
'Id': msg['id'],
'Date': msg['date'],
'Budget': None, # default value if it not in msg
'Category': None, # default value if it not in msg
'Country': None, # default value if it not in msg
'Posted On': None, # default value if it not in msg
'Hourly Range': None, # default value if it not in msg
}
</code></pre>
<p>我使用<code>iter()</code>将列表<code>text</code>转换为迭代器,然后我可以将它与<code>zip(it, it)</code>一起使用来创建像第一个元素<code>{"type": "bold", "text": "Budget"}</code>和第二个元素<code>": $500\\n"</code>这样的对。然后我可以创建pair<code>"Budget"</code>和<code>$500</code>以及kee in<code>row</code></p>
<pre><code>text = msg['text']
it = iter(text) # to create pairs with `zip`
next(it) # skip first element: "Random job description.\\n\\n"
for x, y in zip(it, it): # work with pairs
key = x['text'] # ie. "Budget"
value = y.strip().replace(': ', '') # ie. "$500"
row[key] = value
all_rows.append(row) # keep this row on list
</code></pre>
<p>之后,我有一个包含所有行的列表:</p>
<pre><code>[
{'Id': 20482, 'Date': '2020-12-04T16:34:40', 'Budget': '$500', 'Category': 'UX/UI Design', 'Country': None, 'Posted On': 'December 04, 2020 13:28 UTC', 'Hourly Range': None},
{'Id': 21144, 'Date': '2020-12-06T01:04:50', 'Budget': None, 'Category': None, 'Country': 'Serbia', 'Posted On': 'December 05, 2020 21:31 UTC', 'Hourly Range': '$13.00-$35.00'}
]
</code></pre>
<p>现在我可以使用<code>csv</code>将其写入文件<code>CSV</code>,我可以在<code>Excel</code>、<code>LibreOffice</code>或任何其他类似程序(或Python模块<code>pandas</code>)中读取该文件</p>
<pre><code>import csv
headers = ['Id', 'Date', 'Budget', 'Category', 'Country', 'Posted On', 'Hourly Range']
with open('output.csv', 'w') as fh:
csv_writer = csv.DictWriter(fh, headers)
csv_writer.writeheader()
csv_writer.writerows(all_rows)
</code></pre>
<hr/>
<p>最小工作示例</p>
<pre><code>text = '''
{
"name": "Messenger_group",
"id": 85648902334,
"messages": [
{
"id": 20482,
"type": "message",
"date": "2020-12-04T16:34:40",
"from": "IFTTT",
"from_id": 4535011322,
"text": [
"Random job description.\\n\\n",
{
"type": "bold",
"text": "Budget"
},
": $500\\n",
{
"type": "bold",
"text": "Posted On"
},
": December 04, 2020 13:28 UTC\\n",
{
"type": "bold",
"text": "Category"
},
": UX/UI Design\\n",
{
"type": "link",
"text": "https://url.com/"
}
]
},
{
"id": 21144,
"type": "message",
"date": "2020-12-06T01:04:50",
"from": "IFTTT",
"from_id": 4535011322,
"text": [
"Random job description.\\n\\n",
{
"type": "bold",
"text": "Hourly Range"
},
": $13.00-$35.00\\n",
{
"type": "bold",
"text": "Posted On"
},
": December 05, 2020 21:31 UTC\\n",
{
"type": "bold",
"text": "Country"
},
": Serbia\\n",
{
"type": "link",
"text": "https://url.com"
}
]
}
]
}
'''
# - read -
import json
data = json.loads(text)
# - process -
all_rows = []
headers = ['Id', 'Date', 'Budget', 'Category', 'Country', 'Posted On', 'Hourly Range']
for msg in data['messages']:
row = {
'Id': msg['id'],
'Date': msg['date'],
'Budget': None, # default value if it not in msg
'Category': None, # default value if it not in msg
'Country': None, # default value if it not in msg
'Posted On': None, # default value if it not in msg
'Hourly Range': None, # default value if it not in msg
}
text = msg['text']
it = iter(text) # to create pairs with `zip`
next(it) # skip first element
for x, y in zip(it, it):
key = x['text']
value = y.strip().replace(': ', '')
row[key] = value
all_rows.append(row)
for key,value in row.items():
print(key, ':', value)
print(' -')
# - write -
import csv
with open('output.csv', 'w') as fh:
csv_writer = csv.DictWriter(fh, headers)
csv_writer.writeheader()
csv_writer.writerows(all_rows)
</code></pre>