大型数据格式化程序的Python垃圾收集器

2024-09-28 20:52:52 发布

男 | 程序猿一只，喜欢编程写python代码。

我已经编写了一个程序来读取excel文件的文件夹，并将每个文件加载到程序中。然后，它获取数据并创建一个大小为0（30012001）的数组，该数组将被迭代，excel中相应的坐标值将变为1。然后将数组重新调整为（16005001）大小。我使用tensorflow来重塑数组，因为程序认为它是一个元组，但最终值存储在numpy数组中。最后，我将最终格式化的数组存储到名为“filename”的csv文件中_数组.csv“然后程序将转到下一个要格式化的excel文件。我在Eclipse上运行Python，安装了tensorflow

我遇到的问题是，有些值被缓存在内存中，但我无法确定它是什么。我尝试过显式删除将被重新初始化的大变量gc.收集（）清除存储的非活动内存。我仍然看到内存使用率稳步上升，直到大约25个文件格式化，然后电脑开始冻结，因为我的电脑上的所有内存（12GB）正在使用。我知道python会自动为程序完全无法访问的值清除内存，所以我不确定这是否是RAM碎片化或其他问题。抱歉的文字墙，我只是想给尽可能多的信息的问题。你知道吗

这是一个链接到我的performance tab的屏幕截图，当时我正在运行程序，通过大约24个文件，然后我不得不终止由于计算机冻结的程序。你知道吗

这是我的密码：

from __future__ import print_function
import os
os.environ['TF_CPP_MIN_LOG_LEVEL']='2'
import tensorflow as tf
import numpy as np
import csv

import gc

path = r'C:\Users\jeremy.desforges\Desktop\Eclipse\NN_MNIST\VAM SLIJ-II 4.500'

def create_array(g,h,trainingdata,filename):
    # Multiplying by factors of 10 to keep precision of data
    g = g*1000
    h = h*1
    max_g = 3000
    max_h = 2000

    # Initializes an array with zeros to represent a blank graph
    image = np.zeros((max_g+1,max_h+1),dtype=np.int)
    shape = ((max_g+1)*(max_h+1))

    # Fills the blank graph with the input data points
    for i in range(len(h)):
        image[g[i].astype('int'),h[i].astype('int')] = 1

    trainingdata.close()
    image = tf.reshape(image,[-1,shape])

    # Converts tensor objects to numpy arrays to feed into network
    sess = tf.InteractiveSession()
    image = sess.run(image)

    np.savetxt((filename + "_Array.csv"), np.flip(image,1).astype(int), fmt = '%i' ,delimiter=",")

    print(filename, "appended")
    print("size",image.shape)
    print(image,"=  output array")
    del image,shape,g,h,filename,sess
    return

# Initializing variables

image = []
shape = 1
g = 1.0
h = 1.0
f = 1
specials = '.csv'
folder = os.listdir(path)

for filename in folder:

    trainingdata = open(filename, "r+")
    filename = str(filename.replace(specials, ''))
    data_read = csv.reader(trainingdata)

    for row in data_read:
        in1 = float(row[0])
        in2 = float(row[1])    
        if (f==0):
            z_ = np.array([in1])
            g = np.hstack((g,z_))
            q = np.array([in2])
            h = np.hstack((h,q))
        if (f == 1):
            g = np.array([in1])
            h = np.array([in2])
            f = 0

    create_array(g,h,trainingdata,filename)

    gc.collect()
    image = []
    shape = 1
    g = 1.0
    h = 1.0
    f = 1

Tags：文件 csv to 内存 image import 程序 np

0条回答

目前没有回答

大型数据格式化程序的Python垃圾收集器

相关问题更多 >

编程相关推荐

热门问题

热门文章

大型数据格式化程序的Python垃圾收集器

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >