我正在尝试使用Vaex将一些数据从拼花地板文件读入python
这是我使用vaex.open
函数得到的输出
>>> import vaex
>>> trade = vaex.open('trade.parquet')
>>> trade
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 3703, in __repr__
return self._head_and_tail_table(format='plain')
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 3464, in _head_and_tail_table
return self._as_table(0, n, N - n, N, format=format)
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 3599, in _as_table
parts = table_part(i1, i2, parts)
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 3573, in table_part
df = self[k1:k2]
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 4626, in __getitem__
df = self.trim()
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 3859, in trim
df = self if inplace else self.copy()
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 5036, in copy
df.add_column(name, column, dtype=self._dtypes_override.get(name))
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 6053, in add_column
super(DataFrameArrays, self).add_column(name, data, dtype=dtype)
File "/home/userman/.local/lib/python3.6/site-packages/vaex/dataframe.py", line 2942, in add_column
raise ValueError("array is of length %s, while the length of the DataFrame is %s" % (len(ar), self.length_original()))
ValueError: array is of length 1048576, while the length of the DataFrame is 34421587
数据帧的长度是正确的,但我不明白1048576
与什么有关。我发现了一个关于读取hdf5文件的previous answer,但它似乎与我的问题无关。数据最初从csv文件读取,然后使用pyarrow导出到拼花地板
有人能详细说明这个问题是什么以及如何解决它吗
我也有同样的问题,因此,我假设您使用的是vaex 3.x,请尝试最新的alpha 4.0.0a13,最好是在全新的虚拟环境中
pip install vaex==4.0.0a13
更新
截至3月9日,2021
vaex 4
已退出,并在pypi中标记为默认版本,因此不再需要指定版本pip install vaex
相关问题 更多 >
编程相关推荐