<p>查看<a href="https://ray.readthedocs.io/en/latest/" rel="nofollow noreferrer">ray</a>项目,它是一个分布式执行框架,它利用<a href="https://arrow.apache.org/" rel="nofollow noreferrer">apache arrow</a>进行序列化。如果您正在使用numpy数组,那么它将是一个非常好的ML工作流工具。在</p>
<p>以下是<a href="https://ray.readthedocs.io/en/latest/serialization.html" rel="nofollow noreferrer">object serialization</a>上的文档片段</p>
<blockquote>
<p>In Ray, we optimize for numpy arrays by using the Apache Arrow data
format. When we deserialize a list of numpy arrays from the object
store, we still create a Python list of numpy array objects. However,
rather than copy each numpy array, each numpy array object holds a
pointer to the relevant array held in shared memory. There are some
advantages to this form of serialization.</p>
<ul>
<li>Deserialization can be very fast. </li>
<li>Memory is shared between processes
so worker processes can all read the same data without having to copy
it.</li>
</ul>
</blockquote>
<p>在我看来,对于并行执行,它甚至比多处理库更容易使用,尤其是在希望使用共享内存时,这是<a href="https://ray.readthedocs.io/en/latest/tutorial.html" rel="nofollow noreferrer">tutorial</a>中的用法介绍。在</p>