内存泄漏Python，在for循环中列出

splitted_sentences = [] splitted_pos = [] with open("my_path", "r") as tagged_sentences: for sentence in tagged_sentences: curr_sentence = [] curr_pos = [] for tag in sentence.strip().split(" "): splitted_tag = tag.split("/") word = splitted_tag[0] pos = splitted_tag[1] curr_sentence.append(word) curr_pos.append(pos) splitted_sentences.append(curr_sentence) splitted_pos.append(curr_pos)

2条回答

网友

1楼 · 编辑于 2024-09-27 07:33:24

splitted_sentences是字符串列表。列表的内存开销约为70字节，字符串的内存开销约为40字节。假设平均单词/词组为5个字节，平均句子为10个单词/词组对，则100MB文件为1M句子*10个单词*1个字符串=（1M*70）*（10*40）=28Gb内存（如果所有字符串都是唯一的）。显然，其中许多不是，但是这种内存消耗可以解释为没有内存泄漏。在

我解决这个问题的方法是顺序处理。我怀疑你真的需要所有这些数据同时存储在内存中。用发电机更换主回路可能会改变游戏规则：

def sentence_gen(fname):
    for sentence in open(fname, 'r'):
        yield [pair.split("/", 1) for pair in sentence.strip().split()]

网友

2楼 · 编辑于 2024-09-27 07:33:24

将curr_语句和curr_pos移到for循环之外。然后你可以清除而不是创建新的。我的猜测是，由于某种原因，curr_语句和curr_pos列表不会在for循环的末尾被删除。在

通过将这些列表移到for循环之外，您不会在每次迭代中创建新的列表。在

相关问题更多 >

编程相关推荐

热门问题

热门文章