如何从PyBu中反序列化RecordBatch

2024-10-01 05:02:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我的目标是序列化RecordBatch,通过websocket频道发送 在接收端反序列化。你知道吗

在接收器端,在接收到数据包并重构之后 一个带有pa.py_bufferpyarrow.lib.Buffer对象,我 无法将其反序列化回RecordBatch。你知道吗

远离websocket的样板文件这是一个总结我要做的事情的片段:

import pyarrow as pa

indicators = [(1, 'A'), (2, 'B')]

id = pa.int16()
name = pa.string()

data = pa.array(indicators, type=pa.struct([('id', id), ('name', name)]))

batch = pa.RecordBatch.from_arrays([data], ['indicators'])

buffer = batch.serialize()

# How to get back a RecordBatch from buffer?
#
# ???

Tags: namefromid目标data序列化bufferbatch
1条回答
网友
1楼 · 发布于 2024-10-01 05:02:20

当使用这样的serialize方法时,您可以使用read_record_batch函数给定的已知模式:

>>> pa.ipc.read_record_batch(buffer, batch.schema)
<pyarrow.lib.RecordBatch at 0x7ff412257278>

但这意味着您需要了解接收方的模式。要将其封装在序列化数据中,请改用RecordBatchStreamWriter

>>> sink = pa.BufferOutputStream()
>>> writer = pa.RecordBatchStreamWriter(sink, batch.schema)
>>> writer.write_batch(batch)
>>> writer.close()
>>> buf = sink.getvalue()
>>> reader = pa.ipc.open_stream(buf)
>>> reader.read_all()
pyarrow.Table
indicators: struct<id: int16, name: string>
  child 0, id: int16
  child 1, name: string

参见https://arrow.apache.org/docs/python/ipc.html上的文档

相关问题 更多 >