如何将Tensorflow数据集保存到文件中？

3条回答

网友

1楼 · 编辑于 2024-09-27 09:30:31

补充Yoan的回答：

tf.experimental.save（）和load（）API运行良好。您还需要手动将ds.element_spec保存到磁盘，以便稍后/在不同的上下文中加载（）

酸洗对我来说效果很好：

1-节省：

tf.data.experimental.save(
    ds, tf_data_path, compression='GZIP'
)
with open(tf_data_path + '/element_spec', 'wb') as out_:  # also save the element_spec to disk for future loading
    pickle.dump(ds.element_spec, out_)

2-对于加载，您需要包含tf碎片的文件夹路径和我们手动pickle的元素规范

with open(tf_data_path + '/element_spec', 'rb') as in_:
    es = pickle.load(in_)

loaded = tf.data.experimental.load(
    tf_data_path, es, compression='GZIP'
)

网友

2楼 · 编辑于 2024-09-27 09:30:31

TFRecordWriter似乎是最方便的选择，但不幸的是，它只能用每个元素一个张量来编写数据集。以下是一些您可以使用的变通方法。首先，由于所有张量都具有相同的类型和相似的形状，因此可以将它们连接为一个张量，并在加载后将其拆分回：

import tensorflow as tf

# Write
a = tf.zeros((100, 512), tf.int32)
ds = tf.data.Dataset.from_tensor_slices((a, a, a, a[:, 0]))
print(ds)
# <TensorSliceDataset shapes: ((512,), (512,), (512,), ()), types: (tf.int32, tf.int32, tf.int32, tf.int32)>
def write_map_fn(x1, x2, x3, x4):
    return tf.io.serialize_tensor(tf.concat([x1, x2, x3, tf.expand_dims(x4, -1)], -1))
ds = ds.map(write_map_fn)
writer = tf.data.experimental.TFRecordWriter('mydata.tfrecord')
writer.write(ds)

# Read
def read_map_fn(x):
    xp = tf.io.parse_tensor(x, tf.int32)
    # Optionally set shape
    xp.set_shape([1537])  # Do `xp.set_shape([None, 1537])` if using batches
    # Use `x[:, :512], ...` if using batches
    return xp[:512], xp[512:1024], xp[1024:1536], xp[-1]
ds = tf.data.TFRecordDataset('mydata.tfrecord').map(read_map_fn)
print(ds)
# <MapDataset shapes: ((512,), (512,), (512,), ()), types: (tf.int32, tf.int32, tf.int32, tf.int32)>

但是，更一般地说，您可以简单地为每个张量创建一个单独的文件，然后将其全部读取：

import tensorflow as tf

# Write
a = tf.zeros((100, 512), tf.int32)
ds = tf.data.Dataset.from_tensor_slices((a, a, a, a[:, 0]))
for i, _ in enumerate(ds.element_spec):
    ds_i = ds.map(lambda *args: args[i]).map(tf.io.serialize_tensor)
    writer = tf.data.experimental.TFRecordWriter(f'mydata.{i}.tfrecord')
    writer.write(ds_i)

# Read
NUM_PARTS = 4
parts = []
def read_map_fn(x):
    return tf.io.parse_tensor(x, tf.int32)
for i in range(NUM_PARTS):
    parts.append(tf.data.TFRecordDataset(f'mydata.{i}.tfrecord').map(read_map_fn))
ds = tf.data.Dataset.zip(tuple(parts))
print(ds)
# <ZipDataset shapes: (<unknown>, <unknown>, <unknown>, <unknown>), types: (tf.int32, tf.int32, tf.int32, tf.int32)>

可以将整个数据集放在单个文件中，每个元素有多个单独的张量，即作为包含tf.train.Example的TFRecords文件，但我不知道是否有方法在TensorFlow中创建这些数据集，也就是说，不必将数据集中的数据输入Python，然后将其写入记录文件

网友

3楼 · 编辑于 2024-09-27 09:30:31

GitHUb上出现了一个事件，TF 2.3中似乎有一个新功能可用于写入磁盘：

https://www.tensorflow.org/api_docs/python/tf/data/experimental/save https://www.tensorflow.org/api_docs/python/tf/data/experimental/load

我还没有测试过这个功能，但它似乎正在做你想要的

相关问题更多 >

编程相关推荐

热门问题

热门文章