<p><strong>注意:</strong>从Python2.7.4开始,这对于ZIP存档是不存在问题的。答案底部的细节。这个答案的重点是tar档案。</p>
<p>要找出路径真正指向的位置,请使用<code>os.path.abspath()</code>(但请注意符号链接作为路径组件的注意事项)。如果用<code>abspath</code>规范化来自zipfile的路径,并且它确实不包含当前目录作为前缀,那么它指向它的外部。</p>
<p>但您还需要检查从存档中提取的任何符号链接的<em>值</em>(tarfiles和unix zipfiles都可以存储符号链接)。如果您担心一个众所周知的“恶意用户”会故意绕过您的安全性,而不是一个只在系统库中安装自身的应用程序,那么这一点非常重要。</p>
<p>这就是前面提到的警告:如果沙盒已经包含指向目录的符号链接,那么<code>abspath</code>将被误导。即使是指向沙盒内的符号链接也可能是危险的:符号链接<code>sandbox/subdir/foo -> ..</code>指向<code>sandbox</code>,因此应该禁止路径<code>sandbox/subdir/foo/../.bashrc</code>。最简单的方法是等到前面的文件被提取并使用<code>os.path.realpath()</code>。幸运的是,<code>extractall()</code>接受生成器,所以这很容易做到。</p>
<p>既然你要求代码,这里有一点解释算法。它不仅禁止将文件提取到沙箱外部的位置(这是请求的位置),还禁止在沙箱内部创建指向沙箱外部位置的链接<em>。我很想知道是否有人能从它身边偷看任何流散的文件或链接。</p>
<pre><code>import tarfile
from os.path import abspath, realpath, dirname, join as joinpath
from sys import stderr
resolved = lambda x: realpath(abspath(x))
def badpath(path, base):
# joinpath will ignore base if path is absolute
return not resolved(joinpath(base,path)).startswith(base)
def badlink(info, base):
# Links are interpreted relative to the directory containing the link
tip = resolved(joinpath(base, dirname(info.name)))
return badpath(info.linkname, base=tip)
def safemembers(members):
base = resolved(".")
for finfo in members:
if badpath(finfo.name, base):
print >>stderr, finfo.name, "is blocked (illegal path)"
elif finfo.issym() and badlink(finfo,base):
print >>stderr, finfo.name, "is blocked: Hard link to", finfo.linkname
elif finfo.islnk() and badlink(finfo,base):
print >>stderr, finfo.name, "is blocked: Symlink to", finfo.linkname
else:
yield finfo
ar = tarfile.open("testtar.tar")
ar.extractall(path="./sandbox", members=safemembers(ar))
ar.close()
</code></pre>
<p><strong>编辑:</strong>从python 2.7.4开始,ZIP存档不存在此问题:方法<a href="http://docs.python.org/2/library/zipfile#zipfile.ZipFile.extract" rel="noreferrer">^{<cd9>}</a>禁止在沙盒之外创建文件:</p>
<blockquote>
<p><strong>Note:</strong> If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: <code>///foo/bar</code> becomes <code>foo/bar</code> on Unix, and <code>C:\foo\bar</code> becomes <code>foo\bar</code> on Windows. And all <code>".."</code> components in a member filename will be removed, e.g.: <code>../../foo../../ba..r</code> becomes <code>foo../ba..r</code>. On Windows, illegal characters (<code>:</code>, <code><</code>, <code>></code>, <code>|</code>, <code>"</code>, <code>?</code>, and <code>*</code>) [are] replaced by underscore (_).</p>
</blockquote>
<p><code>tarfile</code>类没有经过类似的清理,因此上面的答案仍然适用。</p>