pythonPandas合并内存不足总线错误破坏python环境

2024-06-30 08:25:07 发布

您现在位置:Python中文网/ 问答频道 /正文

编辑/更新:在没有spark的情况下运行(仅使用pandas merge)会导致总线错误和问题重现。

我经营Pypark的确有当地的背景和熊猫。 从parquet文件中提取一些列之后,我将它们一个接一个地合并到数据帧中。 由于日期问题,我的程序由于多对多合并而内存不足(预期)。 之后,我看到了来自java的长“调用堆栈”。 令人惊讶的是,我的anaconda安装被破坏了,每次都是以不同的方式,但看起来好像“有人”在随机写入anaconda python文件。我已经尝试了10次重新安装和重新运行每次我得到失败。如果在内存耗尽之前停止程序,python环境将保持完全功能。你知道吗

python错误之一:

Fatal Python error: initsite: Failed to import the site module
Traceback (most recent call last):
  File "/opt/anaconda3/lib/python3.7/site.py", line 579, in <module>
    main()
  File "/opt/anaconda3/lib/python3.7/site.py", line 566, in main
    known_paths = addsitepackages(known_paths)
  File "/opt/anaconda3/lib/python3.7/site.py", line 349, in addsitepackages
    addsitedir(sitedir, known_paths)
  File "/opt/anaconda3/lib/python3.7/site.py", line 207, in addsitedir
    addpackage(sitedir, name, known_paths)
  File "/opt/anaconda3/lib/python3.7/site.py", line 163, in addpackage
    for n, line in enumerate(f):
  File "/opt/anaconda3/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe8 in position 1: invalid continuation byte

Pypark公司:

org.apache.spark.rpc.RpcTimeoutException: Futures timed out after [10000 milliseconds]. This timeout is controlled by spark.executor.heartbeatInterval
    at org.apache.spark.rpc.RpcTimeout.org$apache$spark$rpc$RpcTimeout$$createRpcTimeoutException(RpcTimeout.scala:47)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:62)
    at org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:58)
    at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:76)
    at org.apache.spark.rpc.RpcEndpointRef.askSync(RpcEndpointRef.scala:92)
    at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:841)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply$mcV$sp(Executor.scala:870)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:870)
    at org.apache.spark.executor.Executor$$anon$2$$anonfun$run$1.apply(Executor.scala:870)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
    at org.apache.spark.executor.Executor$$anon$2.run(Executor.scala:870)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Futures timed out after [10000 milliseconds]
    at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:223)
    at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:227)
    at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:220)
    at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
    ... 14 more
19/12/02 21:09:18 WARN SparkContext: Ignoring Exception while stopping SparkContext from shutdown hook
java.lang.NoClassDefFoundError: io/netty/channel/AbstractChannelHandlerContext$13
    at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:610)
    at io.netty.channel.AbstractChannelHandlerContext.close(AbstractChannelHandlerContext.java:465)
    at io.netty.channel.DefaultChannelPipeline.close(DefaultChannelPipeline.java:973)
    at io.netty.channel.AbstractChannel.close(AbstractChannel.java:238)
    at org.apache.spark.network.server.TransportServer.close(TransportServer.java:153)
    at org.apache.spark.network.netty.NettyBlockTransferService.close(NettyBlockTransferService.scala:180)
    at org.apache.spark.storage.BlockManager.stop(BlockManager.scala:1615)
    at org.apache.spark.SparkEnv.stop(SparkEnv.scala:90)
    at org.apache.spark.SparkContext$$anonfun$stop$11.apply$mcV$sp(SparkContext.scala:1974)
    at org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1340)
    at org.apache.spark.SparkContext.stop(SparkContext.scala:1973)
    at org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:575)
    at org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1945)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
    at org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
    at scala.util.Try$.apply(Try.scala:192)
    at org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
    at org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
    at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
Caused by: java.lang.ClassNotFoundException: io.netty.channel.AbstractChannelHandlerContext$13
    at java.net.URLClassLoader$1.run(URLClassLoader.java:370)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 24 more
Caused by: java.util.zip.ZipException: invalid LOC header (bad signature)
    at java.util.zip.ZipFile.read(Native Method)
    at java.util.zip.ZipFile.access$1400(ZipFile.java:60)
    at java.util.zip.ZipFile$ZipFileInputStream.read(ZipFile.java:734)
    at java.util.zip.ZipFile$ZipFileInflaterInputStream.fill(ZipFile.java:434)
    at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158)
    at sun.misc.Resource.getBytes(Resource.java:124)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:462)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
    ... 30 more

Tags: runorgapacheutiljavaconcurrentatspark