擅长:python、mysql、java
<blockquote>
<p>without uncompressing the entire file which might be in 100+ GBs</p>
</blockquote>
<p>我的意思是不先提取磁盘。下面是一个Python方法来实现这一点:</p>
<pre><code>import tarfile as tf
import gzip as gz
from StringIO import StringIO
infile = '/path/to/mysql-2016-06-16.tar.gz'
def linecount(infile, member):
lc = 0
with gz.GzipFile(infile) as zipf:
with tf.TarFile(fileobj=zipf) as tarf:
dataf = tarf.extractfile(member)
while dataf.readline():
lc += 1
dataf.close()
return lc
print linecount(infile, 'test.csv')
</code></pre>
<blockquote>
<p>it say's "filename 'test.csv' not found".</p>
</blockquote>
<p>要知道tar文件有哪些成员:</p>
^{pr2}$
<p>要计算tarfile中所有文件的行数:</p>
<pre><code>for member in listmembers(infile):
print member, linecount(infile, member)
</code></pre>
<p>在开始之前,<a href="https://en.wikipedia.org/wiki/Tar_(computing)#File_format" rel="nofollow">know how tar files are structured</a>将很有用。在</p>