擅长:python、mysql、java
<p>Pandas<code>DataFrame.to_parquet</code>是<code>table = pa.Table.from_pandas(...)</code>和<code>pq.write_table(table, ...)</code>(请参见<a href="https://github.com/pandas-dev/pandas/blob/bdb7a1603f1e0948ca0cab011987f616e7296167/pandas/io/parquet.py#L120" rel="nofollow noreferrer">^{<cd4>}</a>)的薄型包装,<a href="https://arrow.apache.org/docs/python/generated/pyarrow.parquet.write_table.html" rel="nofollow noreferrer">^{<cd5>}</a>不支持编写分区数据集。您应该改用<code>pq.write_to_dataset</code>。在</p>
<pre><code>import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
df = pd.DataFrame(yourData)
table = pa.Table.from_pandas(df)
pq.write_to_dataset(
table,
root_path='output.parquet',
partition_cols=['partone', 'parttwo'],
)
</code></pre>
<p>有关详细信息,请参阅<a href="https://arrow.apache.org/docs/python/parquet.html?highlight=partition_cols#writing-to-partitioned-datasets" rel="nofollow noreferrer">pyarrow documentation</a>。在</p>
<p>通常,在读/写parquet文件时,我总是直接使用PyArrow API,因为Pandas包装器的功能相当有限。在</p>