我在循环中过滤数据帧,如下所示:(其中计算和状态是字符串数组)
result_df = None
for c in calculations:
for s in statuses:
df \
.filter(f"""
...
""")
if not result_df:
result_df = df
else:
result_df = result_df.union(df)
我的代码在使用少量数据时可以工作,但在对大量数据执行此操作时,我在stdout
中看到以下错误:
py4j.protocol.Py4JJavaErrorERROR:root:Exception while sending command.
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
response = connection.send_command(command)
File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
"Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
: <exception str() failed>
这似乎是由我在stderr
中看到的错误引起的:
java.lang.StackOverflowError
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:14)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
我看到一些帖子说这可能是由spark的递归引起的,但是我不知道如何解决这个问题。为了解决这个问题,我需要在代码或配置中具体更改什么?有没有其他更好的方法来执行这样的过滤?我需要给某处打电话吗?我需要打电话给sc.setCheckpointDir("/")
吗?如果是这样的话,我还需要做些什么来让检查点工作吗?谢谢
此外,我也尝试增加我的执行者和驱动程序的记忆,但没有任何改变。另外,我看过这篇文章here,但是除了初始化调用之外,答案没有解释检查点是如何使用的
目前没有回答
相关问题 更多 >
编程相关推荐