解析txt文件时按关键字排除某些行

2024-09-27 00:19:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析包含目录和文件列表的txt文件。我对'/ACS/SDU_:''/ACS/ScienceDataFile:'目录感兴趣。我如何排除像/data/foo/bar/ATB6/Science/TGO/ACS:/data/foo/bar/ATB7B/Science/TGO/ACS:这样的dir?我试图通过if 'ATB6' not in line:{}语句排除它们

filesDict = dict()
for file in glob.glob('/foo/bar/catalog/*.txt'):
    with open(os.path.join('/foo/bar/catalog', file), 'r') as openFile:
        path = None
        files = []
        for line in openFile:
            line = line.rstrip()
            if not line.strip():
                if files and (path is not None):
                    filesDict[path] = files
                    path = None
                    files = []
                    continue
            if line.endswith('/ACS/SDU_:') or line.endswith('/ACS/ScienceDataFile:'):
                # save previous results, if any
                if files and (path is not None):
                    filesDict[path] = files
                    path = line[5:-2]
                    files = []
                    next(openFile)
                    continue
    
            if path:
                if 'manifest' not in line:
                    files.append(line)

    # last folder read from file but not yet stored
    if path:
        filesDict[path] = files

Exmaple的txt文件内容:

/data/foo/bar/Science/TGO/NOMAD/ScienceDataFile:
123992
 3766886 2016-02-17 10:44 SCI__DNMD__03000082_2016-048T09-07-27__00001.EXM
 5245980 2016-02-17 10:00 SCI__DNMD__03000081_2016-048T08-48-13__00001.EXM
 3766570 2016-02-17 09:26 SCI__DNMD__03000080_2016-048T08-20-01__00001.EXM

/data/foo/bar/Science/TGO/CASSIS/SDU_:
208744
26934224 2016-02-17 13:11 SDU__DCAS_0003_01200002_2016-047T15-18-48__00001.EXM
35322818 2016-02-17 13:11 SDU__DCAS_0002_01200002_2016-047T15-03-48__00001.EXM

/data/foo/bar/Science/ACS/SDU_:
68421952
17660866 2021-09-06 09:56 SDU__DACS_69DC_0241DB01_2021-246T08-13-26__00001.EXM
17660866 2021-09-06 09:41 SDU__DACS_69DB_0241DB01_2021-246T08-12-37__00001.EXM
17660866 2021-09-06 09:24 SDU__DACS_69DA_0241DB01_2021-246T08-11-46__00001.EXM
17660866 2021-09-06 08:27 SDU__DACS_69D9_0241DB01_2021-246T08-10-56__00001.EXM

/data/foo/bar/Science/TGO/ACS/ScienceDataFile:
69881252
 14759936 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM
       53 2021-09-05 21:51 SCI__DACS__0241DA01_2021-246T04-26-15__00001.EXM.manifest
318758912 2021-09-05 14:42 SCI__DACS__0241D801_2021-246T00-30-32__00001.EXM

/data/foo/bar/ATB6/Science/TGO/ACS/ScienceDataFile:
0

/data/foo/bar/ATB7B/Science/TGO/ACS/SDU_:
4
4
 116 2017-07-12 11:59 ScienceDataFile/
4096 2017-07-12 11:56 SDU_/

Tags: pathdataiffoolinenotbarfiles
1条回答
网友
1楼 · 发布于 2024-09-27 00:19:40

exclude such dirs as /data/foo/bar/ATB6/Science/TGO/ACS: and /data/foo/bar/ATB7B/Science/TGO/ACS:?

您的if line.endswith检查仅在您看到该行的时间有效。因此,在解析文件时,条件本身在错误的时间进行计算(在看到您感兴趣的路径的文件之前)

您需要改为检查path并存储您感兴趣的后缀(每次在dict中“保存”该path时使用此选项)

if any(path.endswith(x) for x in ['/ACS/SDU_:', '/ATB6/Science/TGO/ACS/ScienceDataFile:']):
    filesDict[path] = files
files = []

endswith(':')更改回原来的位置,因为这将正确标识所有路径,而不仅仅是您感兴趣的路径。可以将['ACS/SDU_:', '/ACS/ScienceDataFile:']列表提取到它自己的变量中以供重用

相关问题 更多 >

    热门问题