在Ubuntu和Windows中解压同一文件时使用不同的目录结构

2024-10-05 14:31:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试提取zip文件的内容,可在此处查看:

https://www.geoboundaries.org/data/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-all.zip

在Ubuntu18.04.04上,通过右键单击菜单中的“提取”选项,我从zip文件中获得了一个文件夹结构,其中包括各种空文件夹和目录,以及不同的父文件夹。如果我使用7Zip(在windows或相同的linux设备上)解压相同的文件,我将得到6个文件的预期结果

那么-这里有什么区别

(注意,我已经有了一个解决方案——shutil archive works——只是试图理解不同的行为)

这是当前用于构建相关ZIP的代码(python):

def zipdir(dirPath=None, zipFilePath=None, includeDirInZip=False, citeUsePath=False):
  if not zipFilePath:
    zipFilePath = dirPath + ".zip"
  if not os.path.isdir(dirPath):
    raise OSError("dirPath argument must point to a directory. "
            "'%s' does not." % dirPath)
  parentDir, dirToZip = os.path.split(dirPath)

  def trimPath(path):
    archivePath = path.replace(parentDir, "", 1)
    if parentDir:
      archivePath = archivePath.replace(os.path.sep, "", 1)
    if not includeDirInZip:
      archivePath = archivePath.replace(dirToZip + os.path.sep, "", 1)
    return os.path.normcase(archivePath)

  outFile = zipfile.ZipFile(zipFilePath, "w",compression=zipfile.ZIP_DEFLATED)
  for (archiveDirPath, dirNames, fileNames) in os.walk(dirPath):
    for fileName in fileNames:
      if(not fileName == zipFilePath.split("/")[-1]):
        filePath = os.path.join(archiveDirPath, fileName)
        outFile.write(filePath, trimPath(filePath))

  outFile.write(citeUsePath, os.path.basename(citeUsePath))
  outFile.close() 

Tags: 文件path文件夹ifosnotzipfilename
1条回答
网友
1楼 · 发布于 2024-10-05 14:31:58

zip文件geoBoundaries-2_0_0-NGA-ADM1-all.zip是非标准文件

在Linux上,unzip认为有5个文件没有路径组件

$ unzip -l geoBoundaries-2_0_0-NGA-ADM1-all.zip
Archive:  geoBoundaries-2_0_0-NGA-ADM1-all.zip
  Length      Date    Time    Name
    -          -     
   374953  2020-01-15 21:04   geoBoundaries-2_0_0-NGA-ADM1-shp.zip
  1512980  2020-01-15 21:04   geoBoundaries-2_0_0-NGA-ADM1.geojson
      804  2020-01-15 21:04   geoBoundaries-2_0_0-NGA-ADM1-metaData.json
      750  2020-01-15 21:04   geoBoundaries-2_0_0-NGA-ADM1-metaData.txt
     4656  2020-01-15 21:04   CITATION-AND-USE-geoBoundaries-2_0_0.txt
    -                        -
  1894143                     5 files

如果我尝试提取内容,我会收到很多警告

$ unzip  geoBoundaries-2_0_0-NGA-ADM1-all.zip
Archive:  geoBoundaries-2_0_0-NGA-ADM1-all.zip
geoBoundaries-2_0_0-NGA-ADM1-shp.zip:  mismatching "local" filename (release/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-shp.zip),
         continuing with "central" filename version
  inflating: geoBoundaries-2_0_0-NGA-ADM1-shp.zip
geoBoundaries-2_0_0-NGA-ADM1.geojson:  mismatching "local" filename (release/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1.geojson),
         continuing with "central" filename version
  inflating: geoBoundaries-2_0_0-NGA-ADM1.geojson
geoBoundaries-2_0_0-NGA-ADM1-metaData.json:  mismatching "local" filename (release/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-metaData.json),
         continuing with "central" filename version
  inflating: geoBoundaries-2_0_0-NGA-ADM1-metaData.json
geoBoundaries-2_0_0-NGA-ADM1-metaData.txt:  mismatching "local" filename (release/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-metaData.txt),
         continuing with "central" filename version
  inflating: geoBoundaries-2_0_0-NGA-ADM1-metaData.txt
CITATION-AND-USE-geoBoundaries-2_0_0.txt:  mismatching "local" filename (tmp/CITATION-AND-USE-geoBoundaries-2_0_0.txt),
         continuing with "central" filename version
  inflating: CITATION-AND-USE-geoBoundaries-2_0_0.txt

分析

zip文件中每个条目的详细信息(包括文件名)存储两次。一次在压缩数据前面的local-header中,另一次在文件末尾的central-header中。因此,对于存储在zip文件中的每个文件,都会有一对local-header/central-header字段。这些字段对中的数据应该(大部分)相同

在这种情况下,它们不是

例如,考虑^ {< CD4>}条目为{{CD8}}。匹配的local-header具有release/geoBoundaries-2_0_0/NGA/ADM1/geoBoundaries-2_0_0-NGA-ADM1-shp.zip

此zip文件中的所有条目也是如此

鉴于这是一个非标准/无效的zip文件,解压缩时的行为将取决于解压缩实用程序是使用central-header项中的数据来确定文件名,还是使用local-header项中的等效数据

看起来Ubuntu使用的是local-header字段,而7zip使用的是central-header字段

作为参考,zip文件的规范为APPNOTE.TXT

相关问题 更多 >