如何使用大于6GB的numpy阵列训练模型?

2024-10-01 09:21:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我有几个庞大的培训档案,我正计划培训。验证数据也是完美的,我认为没有问题,但规模是巨大的。我说的是20GB+。由于内存错误,加载一个文件会导致python崩溃

我试过把文件改成一个,但太大了 enter image description here

X = np.load('X150.npy')
Y = np.load('Y150.npy')

错误

^{2}$

我需要一个解决方案,这样我就可以训练大量的数据集。在


Tags: 文件数据内存错误npload档案解决方案
1条回答
网友
1楼 · 发布于 2024-10-01 09:21:25

Important: First make sure that your python is 64bit. The methods below only support files upto 2GB for 32bit python versions

通常,应该使用np.memmap()来使用数组而不加载到RAM上。在numpy docs中,“内存映射文件用于访问磁盘上大文件的小部分,而无需将整个文件读入内存。”

用法示例:

x_file = "X_150.npy"

X = np.memmap(x_file, dtype='int', mode='w+', shape=(300000, 1000))

但是,由于您的文件已经存储为.npy文件,我偶然发现了np.lib.format.open_memmap(),它创建或加载内存映射的.npy文件。在

用法如下所示,与您使用的相同np.memmap公司():

^{pr2}$

以下是第二个函数的文档(来自this answer):

>>> print numpy.lib.format.open_memmap.__doc__

"""
Open a .npy file as a memory-mapped array.

This may be used to read an existing file or create a new one.

Parameters
     
filename : str
    The name of the file on disk. This may not be a filelike object.
mode : str, optional
    The mode to open the file with. In addition to the standard file modes,
    'c' is also accepted to mean "copy on write". See `numpy.memmap` for
    the available mode strings.
dtype : dtype, optional
    The data type of the array if we are creating a new file in "write"
    mode.
shape : tuple of int, optional
    The shape of the array if we are creating a new file in "write"
    mode.
fortran_order : bool, optional
    Whether the array should be Fortran-contiguous (True) or
    C-contiguous (False) if we are creating a new file in "write" mode.
version : tuple of int (major, minor)
    If the mode is a "write" mode, then this is the version of the file
    format used to create the file.

Returns
   -
marray : numpy.memmap
    The memory-mapped array.

Raises
   
ValueError
    If the data or the mode is invalid.
IOError
    If the file is not found or cannot be opened correctly.

See Also
    
numpy.memmap
"""

相关问题 更多 >