在一个循环中打开多个gz文件时，有没有什么方法可以防止内存错误？

import gzip def clearall(): not_variables=[var for var in globals() if (var[:2],var[-2:])==("__","__")] white_list=["files","gzip","clearall"] black_list=[var for var in globals() if var not in white_list+not_variables] for var in black_list: del globals()[var] files=['thing1.gz', 'thing2.gz'] for current_file in files: x=list(gzip.open(current_file,"r")) clearall()

import os files=['thing1.gz', 'thing2.gz'] for current_file in files: temporary_file=open("temp.txt","w") temporary_file.write(current_file) temporary_file.close() execfile("file_open_and_process.py") os.remove("temp.txt")

2条回答

网友

1楼 · 编辑于 2024-10-01 09:25:39

psc_aaa.gz的文件大小为1718317178字节未压缩。如果可能，逐行处理文件，而不是同时在内存中处理：

import gzip

files=['psc_aaa.gz']
for current_file in files:
    with gzip.open(current_file,'rt') as f:
        for line in f:
            print(line,end='')

输出（前几行）：

^{pr2}$

网友

2楼 · 编辑于 2024-10-01 09:25:39

您的处理速度必须非常快，除非您强制垃圾回收器（或者它没有达到其收集阈值），否则它将无法运行

我无法用您的数据测试您的示例，但强制调用的最后一个片段（这是正确的做法）错误地使用了垃圾收集器：

import gzip
import gc

files=['thing1.gz', 'thing2.gz']
for current_file in files:
    x=list(gzip.open(current_file,"r"))
    gc.collect()

当您调用gc.collect()时，您收集的不是当前的x，而是上一个。在调用垃圾回收器之前，您必须del x，因为内存中不能同时存在这两个文件。在

^{pr2}$
现在，如果由于某种原因（wierd）仍然不起作用，只需执行两个进程并用一个参数调用它们：
master.py包含：
import subprocess for current_file in files: subprocess.check_call(["python","other_script.py",current_file])
other_file.py将包含处理：
import sys,gzip with gzip(open(sys.argv[1])) as f: x = list(f) # rest of your processing
最后，将处理结果（必须更小）存储在结果文件中。在
在所有进程运行之后，收集master.py脚本中的数据并继续。在

相关问题更多 >

编程相关推荐

热门问题

热门文章