Spark:静态循环中的过滤导致java.lang.StackOverflow

2024-09-29 19:15:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我在循环中过滤数据帧,如下所示:(其中计算和状态是字符串数组)

result_df = None
for c in calculations:
    for s in statuses:
            df \
                .filter(f"""
                    ...
                """)
            if not result_df:
                result_df = df
            else:
                result_df = result_df.union(df)

我的代码在使用少量数据时可以工作,但在对大量数据执行此操作时,我在stdout中看到以下错误:

py4j.protocol.Py4JJavaErrorERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
: <exception str() failed>

这似乎是由我在stderr中看到的错误引起的:

java.lang.StackOverflowError
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:14)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:32)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:46)
    at org.json4s.jackson.JValueDeserializer.deserialize(JValueDeserializer.scala:39)

我看到一些帖子说这可能是由spark的递归引起的,但是我不知道如何解决这个问题。为了解决这个问题,我需要在代码或配置中具体更改什么?有没有其他更好的方法来执行这样的过滤?我需要给某处打电话吗?我需要打电话给sc.setCheckpointDir("/")吗?如果是这样的话,我还需要做些什么来让检查点工作吗?谢谢

此外,我也尝试增加我的执行者和驱动程序的记忆,但没有任何改变。另外,我看过这篇文章here,但是除了初始化调用之外,答案没有解释检查点是如何使用的


Tags: inorgdflibjavaresultcommandat

热门问题