Python文件系统阅读器Performan

import sys,os from pz import padZero #prepends 0's to string until desired length output = open('./out.txt', 'w') input = open('./in.txt', 'r') rootPath = '\\\\server\share\' #UNC path to storage for ifid in input: ifid = padZero(str(ifid)[:-1], 8) #extracts/formats fileName dir = padZero(str(ifid)[:-3], 5) #exracts/formats the directory containing the file fPath = rootPath + '\\' + dir + '\\' + ifid + '.tif' try: size = os.path.getsize(fPath) #don't actually need size, better approach? except: output.write(ifid+'\n')

3条回答

网友

1楼 · 编辑于 2024-09-29 23:32:50

您将受到I/O的限制，尤其是在网络上，因此您可以对脚本进行的任何更改都将导致非常小的加速，但在我的头脑中：

import os

input, output = open("in.txt"), open("out.txt", "w")

root = r'\\server\share'

for fid in input:
    fid  = fid.strip().rjust(8, "0")
    dir  = fid[:-3]      # no need to re-pad
    path = os.path.join(root, dir, fid + ".tif")
    if not os.path.isfile(path):
        output.write(fid + "\n")

我真的不指望它会更快，但可以说它更容易阅读。在

其他方法可能更快。例如，如果您希望访问大多数文件，您只需从服务器中提取一个完整的递归目录列表，将其转换为Pythonset()，然后检查其中的成员资格，而不是为许多小请求访问服务器。我将把代码留作练习。。。在

网友

2楼 · 编辑于 2024-09-29 23:32:50

在我看来，padZero和字符串连接的东西需要很长时间。在

你想让它做的就是把所有的时间都花在阅读目录上，其他的很少。在

你一定要用python来做吗？我在C和C++中做过类似的事情。Java应该也不错。在

网友

3楼 · 编辑于 2024-09-29 23:32:50

dirs = collections.defaultdict(set)

for file_path in input:
    file_path = file_path.rjust(8, "0")
    dir, name = file_path[:-3], file_path

    dirs[dir].add(name)

for dir, files in dirs.iteritems():
    for missing_file in files - set(glob.glob("*.tif")):
        print missing_file

解释

首先将输入文件读入目录的字典：filename。然后，对于每个目录，列出服务器上该目录中的所有TIFF文件，并（set）从您应该拥有的文件名集合中减去这些文件名。把剩下的都打印出来。在

_{编辑：修正了一些愚蠢的事情。我写这篇文章的时候已经太晚了！}

解释

相关问题更多 >

编程相关推荐

热门问题

热门文章