解析包含文件列表的文本文件

2024-09-27 00:13:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有txt文件,其中包含服务器上目录和it文件的信息。从每个文件,我只需要得到2个目录与它的文件名,使他们进入一个大规模的进一步比较与本地文件

我想逐行阅读,但我被卡住了。特别是如果它在我不感兴趣的其他目录中的文件名中找到“SDU__DACS”,它会将其写入以前的目录名

我试着:

pathSDU = []
pathSCI = []
filesDict = {}
for file in glob.glob('/foo/bar/catalog/*.txt'):
    with open(os.path.join('/foo/bar/catalog', file), 'r') as openFile:
        print('opening file ' + file)
        for line in openFile:
            if '/ACS/SDU_:' in line:
                pathSDU = line
            else:
                if 'SDU__DACS' in line:
                    if 'manifest' not in line:
                        filesDict.update({line: pathSDU})
            if '/ACS/ScienceDataFile:' in line:
                pathSCI = line
            else:
                if 'SCI__DACS' in line:
                    if 'manifest' not in line:
                        filesDict.update({line: pathSCI})

txt文件内容示例:

/data/foo/bar/ACS/SDU_:
68421952
17660866 2021-09-06 09:56 SDU__DACS_69DC_0241DB01_2021-246T08-13-26__00001.EXM
17660866 2021-09-06 09:41 SDU__DACS_69DB_0241DB01_2021-246T08-12-37__00001.EXM
17660866 2021-09-06 09:24 SDU__DACS_69DA_0241DB01_2021-246T08-11-46__00001.EXM
17660866 2021-09-06 08:27 SDU__DACS_69D9_0241DB01_2021-246T08-10-56__00001.EXM

/data/foo/bar/TGO/ACS/ScienceDataFile:
69881252
 14759936 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM
       53 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM.manifest
318758912 2021-09-05 14:42 SCI__DACS__0241D801_2021-246T00-30-32__00001.EXM

Tags: 文件in目录txtiffoolinebar
1条回答
网友
1楼 · 发布于 2024-09-27 00:13:29

尝试使用以下命令

with open('data.txt') as openFile:

    path = None
    files = []
    filesDict = dict()

    for line in openFile:
        line = line.rstrip()

        # empty line; you're done with one folder  
        # store the previous data, if it exists and start a new collection 
        if not line.strip(): 
            if files and (path is not None):
                filesDict[path] = files
                path = None; files = []
                continue
    
        # paths end with colons, don't store specific variables for each path 
        if line.endswith(':'):  
            # save previous results, if any 
            if files and (path is not None):
               filesDict[path] = files
    
            path = line; files = []
            next(openFile) # skip the line with only a number 
            continue 
    
        if path:  # only continue if path is defined 
            if 'manifest' not in line:
                files.append(line)
    
    # last folder read from file but not yet stored 
    if path:
        filesDict[path] = files

样本运行

for p, l in filesDict.items():
  print(p)
  for f in l:
    print('\t', f)

输出

/data/foo/bar/ACS/SDU_:
     17660866 2021-09-06 09:56 SDU__DACS_69DC_0241DB01_2021-246T08-13-26__00001.EXM
     17660866 2021-09-06 09:41 SDU__DACS_69DB_0241DB01_2021-246T08-12-37__00001.EXM
     17660866 2021-09-06 09:24 SDU__DACS_69DA_0241DB01_2021-246T08-11-46__00001.EXM
     17660866 2021-09-06 08:27 SDU__DACS_69D9_0241DB01_2021-246T08-10-56__00001.EXM
/data/foo/bar/TGO/ACS/ScienceDataFile:
      14759936 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM
     318758912 2021-09-05 14:42 SCI__DACS__0241D801_2021-246T00-30-32__00001.EXM

相关问题 更多 >

    热门问题