使用Python绑定时Apache Arrow总线错误/Seg错误

2024-09-28 19:34:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在将数据写入拼花地板文件。apachearrow提供了一个简单的例子:parquet-arrow,其中数据流本质上是:data=>;arrow::ArrayBuilder=>;arrow::Array=>;箭头::表=>;拼花锉刀。这是作为独立的C++工作的,但是当我试图将这个代码绑定到Python模块并从Python调用它(我使用Python 3.8.)时,总线错误10(或SEG故障11)在箭头上始终出现:arrow::Arrays(即在ArrayBuilder::Finish函数中)。有人知道为什么会发生这种情况,或者如何纠正它吗

为了解决这个问题,我尝试了一些调整,例如使用静态与动态库链接,使用ArrayBuilder::Finish重载的变体,以及使用不同的工具来创建python模块/。因此(尝试了pybind11和boost python),但错误仍然存在。它在arrow::ArrayBuilder::Finish(shared_ptrarrow::Array*)中持续崩溃。我在运行macOS。此简单的.py和.cc代码足以重新创建错误:

import pybindtest
pybindtest.python_bind_test()
#include <iostream>
#include <arrow/api.h>
#include <arrow/io/api.h>
#include <parquet/arrow/writer.h>
#include <pybind11/pybind11.h>

std::shared_ptr<arrow::Table> generate_table() {
  arrow::Int64Builder i64builder;
  std::shared_ptr<arrow::Array> i64array;
  PARQUET_THROW_NOT_OK(i64builder.AppendValues({2, 4}));
  PARQUET_THROW_NOT_OK(i64builder.Finish(&i64array));

  arrow::StringBuilder strbuilder;
  std::shared_ptr<arrow::Array> strarray;
  PARQUET_THROW_NOT_OK(strbuilder.Append("some"));
  PARQUET_THROW_NOT_OK(strbuilder.Append("content"));
  PARQUET_THROW_NOT_OK(strbuilder.Finish(&strarray));

  std::shared_ptr<arrow::Schema> schema = arrow::schema(
      {arrow::field("int", arrow::int64()), 
       arrow::field("str", arrow::utf8())});

  return arrow::Table::Make(schema, {i64array, strarray});
}

void write_parquet_file(const arrow::Table& table) {
  std::shared_ptr<arrow::io::FileOutputStream> outfile;
  PARQUET_ASSIGN_OR_THROW(outfile,arrow::io::FileOutputStream::Open("pybindtest.parquet"));
  PARQUET_THROW_NOT_OK(parquet::arrow::WriteTable(table, arrow::default_memory_pool(), outfile, 3));
}

void python_bind_test() {
  std::shared_ptr<arrow::Table> table = generate_table();
  write_parquet_file(*table);
}

PYBIND11_MODULE(pybindtest, m) {
  m.def("python_bind_test", &python_bind_test);
}

这是其中一个核心的回溯:

$ lldb -c core.84103 
(lldb) target create --core "core.84103"
Core file '/cores/core.84103' (x86_64) was loaded.

(lldb) bt
* thread #1, stop reason = signal SIGSTOP
  * frame #0: 0x00007fff91b52a58 libc++abi.dylib`vtable for __cxxabiv1::__si_class_type_info + 16
    frame #1: 0x0000000103b1f4c8 libarrow.300.0.0.dylib`arrow::ArrayBuilder::Finish(std::__1::shared_ptr<arrow::Array>*) + 40
    frame #2: 0x0000000103a0c492 pybindtest.cpython-38-darwin.so`generate_table() + 642
    frame #3: 0x0000000103a0e298 pybindtest.cpython-38-darwin.so`python_bind_test() + 24
    frame #4: 0x0000000103a4425f pybindtest.cpython-38-darwin.so`void pybind11::detail::argument_loader<>::call_impl<void, void (*&)(), pybind11::detail::void_type>(void (*&)(), pybind11::detail::index_sequence<>, pybind11::detail::void_type&&) && + 31
    frame #5: 0x0000000103a44136 pybindtest.cpython-38-darwin.so`std::__1::enable_if<std::is_void<void>::value, pybind11::detail::void_type>::type pybind11::detail::argument_loader<>::call<void, pybind11::detail::void_type, void (*&)()>(void (*&)()) && + 54
    frame #6: 0x0000000103a43ff2 pybindtest.cpython-38-darwin.so`void pybind11::cpp_function::initialize<void (*&)(), void, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(), void (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::operator()(pybind11::detail::function_call&) const + 130
    frame #7: 0x0000000103a43f55 pybindtest.cpython-38-darwin.so`void pybind11::cpp_function::initialize<void (*&)(), void, pybind11::name, pybind11::scope, pybind11::sibling>(void (*&)(), void (*)(), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&)::'lambda'(pybind11::detail::function_call&)::__invoke(pybind11::detail::function_call&) + 21
    frame #8: 0x0000000103a2cb62 pybindtest.cpython-38-darwin.so`pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 4818
    frame #9: 0x00000001035cf164 python`cfunction_call_varargs + 68
    frame #10: 0x00000001035ce3a7 python`_PyObject_MakeTpCall + 167
    frame #11: 0x0000000103713228 python`_PyEval_EvalFrameDefault + 45944
    frame #12: 0x0000000103706060 python`_PyEval_EvalCodeWithName + 560
    frame #13: 0x0000000103780a7c python`PyRun_FileExFlags + 364
    frame #14: 0x0000000103780171 python`PyRun_SimpleFileExFlags + 529
    frame #15: 0x00000001037a8c5a python`pymain_run_file + 394
    frame #16: 0x00000001037a81b6 python`pymain_run_python + 486
    frame #17: 0x00000001037a7f88 python`Py_RunMain + 24
    frame #18: 0x00000001037a9670 python`pymain_main + 32
    frame #19: 0x00000001035a1cb9 python`main + 57
    frame #20: 0x00007fff6b8b7cc9 libdyld.dylib`start + 1
    frame #21: 0x00007fff6b8b7cc9 libdyld.dylib`start + 1

Tags: tablecpythonframesharedpybind11stdthrowdetail
1条回答
网友
1楼 · 发布于 2024-09-28 19:34:37

经过进一步调查,此错误似乎是由我从源代码构建的arrow cpp库与我从conda forge安装的Pyarow包之间的冲突触发的。我可以通过pip将pyarrow安装到我的conda env中,而不是从conda forge通道中取出它来解决这个问题(在我的例子中,pyarrow也是如此,因为它依赖于pyarrow)

虽然我不知道这种不兼容的确切原因,但它可能与Arrow Python Documentation中提到的当前MacOS警告有关,即:

Using conda to build Arrow on macOS is complicated by the fact that the conda-forge compilers require an older macOS SDK. Conda offers some installation instructions; the alternative would be to use Homebrew and pip instead.

相关问题 更多 >