理解多处理：Python中的共享内存管理、锁和队列

1条回答

网友

1楼 · 发布于 2024-05-08 23:40:01

multiprocessing.Lock是使用操作系统提供的信号量对象实现的。在Linux上，子对象只是通过os.fork从父对象继承信号量的句柄。这不是信号量的副本；它实际上继承了父进程拥有的相同句柄，继承文件描述符的方式也一样。另一方面，Windows不支持os.fork，因此它必须对Lock进行pickle。它通过使用Windows ^{}API为multiprocessing.Lock对象在内部使用的Windows信号量创建一个重复的句柄来实现，该API声明：

The duplicate handle refers to the same object as the original handle. Therefore, any changes to the object are reflected through both handles

通过DuplicateHandleAPI，您可以将复制句柄的所有权授予子进程，以便子进程在取消对其的绑定后可以实际使用它。通过创建子对象拥有的重复句柄，可以有效地“共享”锁对象。

这里是multiprocessing/synchronize.py中的信号量对象

class SemLock(object):

    def __init__(self, kind, value, maxvalue):
        sl = self._semlock = _multiprocessing.SemLock(kind, value, maxvalue)
        debug('created semlock with handle %s' % sl.handle)
        self._make_methods()

        if sys.platform != 'win32':
            def _after_fork(obj):
                obj._semlock._after_fork()
            register_after_fork(self, _after_fork)

    def _make_methods(self):
        self.acquire = self._semlock.acquire
        self.release = self._semlock.release
        self.__enter__ = self._semlock.__enter__
        self.__exit__ = self._semlock.__exit__

    def __getstate__(self):  # This is called when you try to pickle the `Lock`.
        assert_spawning(self)
        sl = self._semlock
        return (Popen.duplicate_for_child(sl.handle), sl.kind, sl.maxvalue)

    def __setstate__(self, state): # This is called when unpickling a `Lock`
        self._semlock = _multiprocessing.SemLock._rebuild(*state)
        debug('recreated blocker with handle %r' % state[0])
        self._make_methods()

注意assert_spawning调用中的__getstate__，它在拾取对象时被调用。这是如何实现的：

#
# Check that the current thread is spawning a child process
#

def assert_spawning(self):
    if not Popen.thread_is_spawning():
        raise RuntimeError(
            '%s objects should only be shared between processes'
            ' through inheritance' % type(self).__name__
            )

这个函数通过调用thread_is_spawning，确保您“继承”了Lock。在Linux上，该方法只返回False：

@staticmethod
def thread_is_spawning():
    return False

这是因为Linux不需要pickle来继承Lock，所以如果__getstate__实际上是在Linux上调用的，我们就不能继承。在Windows上，还有更多：

def dump(obj, file, protocol=None):
    ForkingPickler(file, protocol).dump(obj)

class Popen(object):
    '''
    Start a subprocess to run the code of a process object
    '''
    _tls = thread._local()

    def __init__(self, process_obj):
        ...
        # send information to child
        prep_data = get_preparation_data(process_obj._name)
        to_child = os.fdopen(wfd, 'wb')
        Popen._tls.process_handle = int(hp)
        try:
            dump(prep_data, to_child, HIGHEST_PROTOCOL)
            dump(process_obj, to_child, HIGHEST_PROTOCOL)
        finally:
            del Popen._tls.process_handle
            to_child.close()


    @staticmethod
    def thread_is_spawning():
        return getattr(Popen._tls, 'process_handle', None) is not None

这里，thread_is_spawning如果Popen._tls对象具有process_handle属性，则返回True。我们可以看到在__init__中创建了process_handle属性，然后使用dump将要继承的数据从父级传递到子级，然后删除该属性。因此thread_is_spawning将只在__init__期间是True。根据this python-ideas mailing list thread，这实际上是一个人为的限制，用于模拟Linux上的os.fork行为。Windows实际上可以在任何时候支持传递Lock，因为DuplicateHandle可以在任何时候运行。

以上所有内容都适用于Queue对象，因为它在内部使用Lock。

我想说，继承Lock对象比使用Manager.Lock()对象更好，因为当您使用Manager.Lock时，您对Lock的每个调用都必须通过IPC发送到Manager进程，这将比使用位于调用进程内的共享Lock进程慢得多。不过，这两种方法都是完全有效的。

最后，可以不使用Manager，使用initializer/initargs关键字参数，将Lock传递给Pool的所有成员：

lock = None
def initialize_lock(l):
   global lock
   lock = l

def scenario_1_pool_no_manager(jobfunc, args, ncores):
    """Runs a pool of processes WITHOUT a Manager for the lock and queue.

    """
    lock = mp.Lock()
    mypool = mp.Pool(ncores, initializer=initialize_lock, initargs=(lock,))
    queue = mp.Queue()

    iterator = make_iterator(args, queue)

    mypool.imap(jobfunc, iterator) # Don't pass lock. It has to be used as a global in the child. (This means `jobfunc` would need to be re-written slightly.

    mypool.close()
    mypool.join()

return read_queue(queue)

这是因为传递给initargs的参数被传递给在Pool内运行的Process对象的__init__方法，所以它们最终被继承，而不是被pickle。

工作

条件

结果

完整脚本

相关问题更多 >

编程相关推荐

热门问题

热门文章