回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在尝试开发一个递归提取器。问题是,它递归的次数太多(每次它找到一个归档类型)并使性能受到影响。在</p>
<p>那么我该如何改进下面的代码呢?在</p>
<p>我的想法:</p>
<p>先获取direcories的“Dict”和file类型。文件类型钥匙组件。提取文件类型。当找到一个档案时,只提取那一个。然后重新生成Archive Dict。在</p>
<p>我的想法2:</p>
<p>在手术室步行返回生成器。我能用发电机做些什么吗?我不熟悉发电机。在</p>
<p>以下是当前代码:</p>
<pre><code>import os, magic
m = magic.open( magic.MAGIC_NONE )
m.load()
archive_type = [ 'gzip compressed data',
'7-zip archive data',
'Zip archive data',
'bzip2 compressed data',
'tar archive',
'POSIX tar archive',
'POSIX tar archive (GNU)',
'RAR archive data',
'Microsoft Outlook email folder (>=2003)',
'Microsoft Outlook email folder']
def extractRecursive( path ,archives):
i=0
for dirpath, dirnames, filenames in os.walk( path ):
for f in filenames:
fp = os.path.join( dirpath, f )
i+=1
print i
file_type = m.file( fp ).split( "," )[0]
if file_type in archives:
arcExtract(fp,file_type,path,True)
extractRecursive(path,archives)
return "Done"
def arcExtract(file_path,file_type,extracted_path="/home/v3ss/Downloads/extracted",unlink=False):
import subprocess,shlex
if file_type in pst_types:
cmd = "readpst -o '%s' -S '%s'" % (extracted_path,file_path)
else:
cmd = "7z -y -r -o%s x '%s'" % (extracted_path,file_path)
print cmd
args= shlex.split(cmd)
print args
try:
sp = subprocess.Popen( args, shell = False, stdout = subprocess.PIPE, stderr = subprocess.PIPE )
out, err = sp.communicate()
print out, err
ret = sp.returncode
except OSError:
print "Error no %s Message %s" % (OSError.errno,OSError.message)
pass
if ret == 0:
if unlink==True:
os.unlink(file_path)
return "OK!"
else:
return "Failed"
if __name__ == '__main__':
extractRecursive( 'Path/To/Archives' ,archive_type)
</code></pre>