python struct.error:“i”格式要求-2147483648<=数字<=2147483647

from multiprocessing import Pool, cpu_count from itertools import repeat p = Pool(8) is_train_seq = [True]*len(historyCutoffs)+[False] config_zip = zip(historyCutoffs, repeat(train_scala), repeat(test), repeat(ts), ul_parts_path, repeat(members), is_train_seq) p.starmap(multiprocess_FE, config_zip)

Traceback (most recent call last): File "main_1210_FE_scala_multiprocessing.py", line 705, in <module> print('----Pool starmap start----') File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 274, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 644, in get raise self._value File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks put(task) File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py", line 206, in send self._send_bytes(_ForkingPickler.dumps(obj)) File "/home/dmlab/ksedm1/anaconda3/envs/py36/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes header = struct.pack("!i", n) struct.error: 'i' format requires -2147483648 <= number <= 2147483647

def multiprocess_FE(historyCutoff, train_scala, test, ts, ul_part_path, members, is_train): train_dict = {} ts_dict = {} msno_dict = {} ul_dict = {} if is_train == True: train_dict[historyCutoff] = train_scala[train_scala.historyCutoff == historyCutoff] else: train_dict[historyCutoff] = test msno_dict[historyCutoff] = set(train_dict[historyCutoff].msno) print('length of msno is {:d} in cutoff {:d}'.format(len(msno_dict[historyCutoff]), historyCutoff)) ts_dict[historyCutoff] = ts[(ts.transaction_date <= historyCutoff) & (ts.msno.isin(msno_dict[historyCutoff]))] print('length of transaction is {:d} in cutoff {:d}'.format(len(ts_dict[historyCutoff]), historyCutoff)) ul_part = pd.read_csv(gzip.open(ul_part_path, mode="rt")) ##.sample(frac=0.01, replace=False) ul_dict[historyCutoff] = ul_part[ul_part.msno.isin(msno_dict[historyCutoff])] train_dict[historyCutoff] = enrich_by_features(historyCutoff, train_dict[historyCutoff], ts_dict[historyCutoff], ul_dict[historyCutoff], members, is_train)

2条回答

网友

1楼 · 编辑于 2024-06-26 14:09:32

这个问题在最近的python公关中得到了解决 https://github.com/python/cpython/pull/10305

如果需要，可以在本地进行此更改，使其立即为您工作，而无需等待python和anaconda的发布。

网友

2楼 · 编辑于 2024-06-26 14:09:32

进程之间的通信协议使用pickling，pickled数据的前缀是pickled数据的大小。对于您的方法，所有参数一起作为一个对象进行pickle。

您生成了一个对象，当pickled大于i结构格式化程序（一个四字节有符号整数）的大小时，该对象将打破代码所做的假设。

您可以将数据帧的读取委托给子进程，而只跨加载数据帧所需的元数据发送。它们的总大小接近1GB，太多数据无法在进程之间通过管道共享。

引用Programming guidelines section：

Better to inherit than pickle/unpickle
When using the spawn or forkserver start methods many types from multiprocessing need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process.

如果不在Windows上运行并使用spawn或forkserver方法，则可以在启动子进程之前将数据帧加载为globals，此时子进程将通过正常的OS copy on write memory页面共享机制“继承”数据。

问题

额外信息

相关问题更多 >

编程相关推荐

热门问题

热门文章