擅长:python、mysql、java
<p>例如,如果<code>input.map(f)</code>将输入作为RDD,这可能会起作用,因为您无法访问执行器中RDD映射函数的JVM变量(附加到spark上下文中)(据我所知,pyspark中没有<code>@transient lazy val</code>的等价物)。在</p>
<pre><code>def pythonGatewayIterator(iterator):
results = []
jvm = py4j.java_gateway.JavaGateway().jvm
mygw = jvm.myPackage.MyPythonGateway()
for value in iterator:
results.append(mygw.findMyNum(value))
return results
inputs.mapPartitions(pythonGatewayIterator)
</code></pre>