擅长:python、mysql、java
<p>尝试使用regexp-easy导入更复杂的模式。以下是@Kozubi answer的扩展版本:</p>
<pre><code> import json
import re
json_data = []
with open("test.txt") as f:
pattern = re.compile(r"""image_id\s+(?P<image_id>[0-9]+)\s+
caption\s+(?P<caption>.*)$
""", re.X)
for line in f.readlines():
m = pattern.match(line.strip())
if m:
json_data.append({
"image_id": int(m.group('image_id')),
"caption": m.group('caption')
})
print(json.dumps(json_data, indent=4))
json.dump(json_data, open("json_dump.json", 'w'), indent=4)
</code></pre>