在archi上不带时间戳的情况下，将数据帧写入gzip csv问题的回答

在archi上不带时间戳的情况下，将数据帧写入gzip csv

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

在浏览了Pandas的<a href="https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/io/formats/csvs.py#L123" rel="nofollow noreferrer">CSV writing</a>代码之后，我建议最好直接使用<code>gzip</code>模块。这样您就可以直接设置<a href="https://docs.python.org/3/library/gzip.html#gzip.GzipFile" rel="nofollow noreferrer">^{<cd2>} attribute</a>，这似乎就是您想要的： <pre><code>import pandas as pd from gzip import GzipFile from io import TextIOWrapper def to_gzip_csv_no_timestamp(df, f, *kwargs): # Write pandas DataFrame to a .csv.gz file, without a timestamp in the archive # header, using GzipFile and TextIOWrapper. # # Args: # df: pandas DataFrame. # f: Filename string ending in .csv (not .csv.gz). # *kwargs: Other arguments passed to to_csv(). # # Returns: # Nothing. with TextIOWrapper(GzipFile(f, 'w', mtime=0), encoding='utf-8') as fd: df.to_csv(fd, *kwargs) to_gzip_csv_no_timestamp(df, 'df.csv.gz') to_gzip_csv_no_timestamp(df, 'df2.csv.gz') filecmp.cmp('df.csv.gz', 'df2.csv.gz') # True </code></pre> 对于这个小数据集，这优于下面的两步<code>subprocess</code>方法： <pre><code>%timeit to_gzip_csv_no_timestamp(df, 'df.csv') 693 us +- 14.6 us per loop (mean +- std. dev. of 7 runs, 1000 loops each) %timeit to_gzip_csv_no_timestamp_subprocess(df, 'df.csv') 10.2 ms +- 167 us per loop (mean +- std. dev. of 7 runs, 100 loops each) </code></pre> 我使用<code>TextIOWrapper()</code>将字符串转换为字节作为<a href="https://github.com/pandas-dev/pandas/blob/v0.23.4/pandas/io/common.py#L298" rel="nofollow noreferrer">Pandas does</a>处理，但如果您知道不会保存太多数据，也可以这样做： <pre><code>with GzipFile('df.csv.gz', 'w', mtime=0) as fd: fd.write(df.to_csv().encode('utf-8')) </code></pre> 注意，<code>gzip -lv df.csv.gz</code>仍然显示“当前时间”，但它只是从inode的mtime中提取这个值。使用<code>hexdump -C</code>转储显示值保存在文件中，更改文件mtime（使用<code>touch -mt 0711171533 df.csv.gz</code>）会导致<code>gzip</code>显示不同的值 还要注意，原始的“filename”也是gzip文件的一部分，因此您需要写入相同的名称（或者重写此名称）以使其具有确定性。你知道吗

在archi上不带时间戳的情况下，将数据帧写入gzip csv

1 个回答

相关Python问题