<p>我找到了各种不同的选择。我试着判断他们是否有其他选择。Pyrefine是迄今为止唯一真正的Python解决方案。</em></p>
<h2><strong>备选方案</strong></h2>
<p>I.一个部分解决方案<a href="https://medium.com/optima-blog/semi-automated-text-cleaning-in-r-68054a9491da" rel="nofollow noreferrer">here</a>在R中用Python创建一个字典来进行转换。这不实现GREPL编辑、Jython/Python编辑或闭包编辑。在</p>
<blockquote>
<pre><code>#!/usr/bin/env python2
#
# Description
# This builds a dictionary-style structure to R with Python
# to do the JSON edits on other data, with only `Cluster-edit` support.
#
# The original source is (1) on which I have done some remarks.
#
# Further reading
#
# (1) Original source of the code, https://medium.com/optima-blog/semi-automated-text-cleaning-in-r-68054a9491da
import json
import sys
import os
if len(sys.argv) < 2:
print “USAGE: ./utils/open_refine_to_R.py [edits.json] > r_file.R”
exit(1)
json_file = sys.argv[-1]
#conversions = json.load(open(“state_clustering.json”))
conversions = json.load(open(json_file))
function_name = os.path.splitext(os.path.basename(json_file))[0]
print “%s = function(df) {“ %function_name
for conv in conversions:
#THIS WILL fire ERRORS WITHOUT try-catch eg. with regexes
edits = conv[‘edits’]
columnName = str(conv[‘columnName’])
for edit in edits:
froms = edit[‘from’]
to = edit[‘to’]
for source in froms:
source = str(source)
to = str(to)
print “ df[df[, %s] == %s, %s] = %s” %(repr(columnName),
repr(source), repr(columnName), repr(to))
print “ df”
print “}”
</code></pre>
</blockquote>
<p>可以将输出编辑为Python格式。在</p>
<p>二。<a href="https://github.com/fusepoolP3/p3-batchrefine" rel="nofollow noreferrer">P3-batchrefine</a>大部分是用Java编写的,但也有一些Python。它允许您以以下方式进行转换(除非您能够很好地调用外部Java库,否则不是真正的Python解决方案)。在</p>
<blockquote>
<p>./bin/batchrefine remote input.csv transform.json > output.csv</p>
</blockquote>
<p>III.<a href="https://github.com/jezcope/pyrefine" rel="nofollow noreferrer">Pyrefine</a>是一个真正的python解决方案,它的目标是按照以下方式工作,复制自其文档:</p>
<pre><code>import pyrefine
import pandas as pd
with open('script.json') as script_file:
script = pyrefine.parse(script_file)
input_data = pd.read_csv('input.csv')
output_data = script.execute(input_data)
</code></pre>
<h2><strong>有关解析OpenRefine JSON的更多信息</strong></h2>
<blockquote>
<ol>
<li><p><a href="https://stackoverflow.com/questions/18444834/trying-to-parse-a-json-with-open-refine-grel">Trying to parse a Json with Open Refine GREL</a></p></li>
<li><p><a href="https://stackoverflow.com/questions/10304238/parse-json-in-google-refine">Parse JSON in Google Refine</a></p></li>
<li><p><a href="https://stackoverflow.com/questions/40715596/best-way-to-parse-a-big-and-intricated-json-file-with-openrefine-or-r">Best way to parse a big and intricated Json file with OpenRefine (or R)</a></p></li>
</ol>
</blockquote>