使用python从同一目录中的多个文件中提取特定行

ITEM: TIMESTEP 10000 ITEM: NUMBER OF ATOMS 1000 ITEM: BOX BOUNDS pp pp pp 0.0000000000000000e+00 9.4000000000000004e+00 0.0000000000000000e+00 9.4000000000000004e+00 0.0000000000000000e+00 9.4000000000000004e+00 ITEM: ATOMS id x y z 673 1.03559 0.495714 0.575399 346 2.74458 1.30048 0.0566235 991 0.570383 0.589025 1.44128 793 0.654365 1.33452 1.91347 969 0.217201 0.6852 0.287291 . . . .

coord = [] filenames = natsort.natsorted(glob.glob('*.dat')) for f in filenames: buff = open(f, 'r').readlines() for row in buff: if row.startswith('673'): coord.append(row) np.savetxt("xyz.txt",coord,fmt,delimiter=' ')

2条回答

网友

1楼 · 编辑于 2024-10-01 09:22:32

如果没有更多的背景知识，我无法想象没有一种方法可以在不读取Atom Id所在的行的情况下找到正确的行。在

你会做一些类似的事情：

^{1}$

否则，您可以保存/读取每个文件的“Mapping”ID<；->；行号

不过，我认为你应该想出一个办法，以有序的方式保存这些职位。也许你也可以在你的问题中给出信息，是什么阻止你保存由Atom ID排序的位置

我建议使用hdf5 library来存储包含元数据的大型数据集。

网友

2楼 · 编辑于 2024-10-01 09:22:32

您可以使用正则表达式从所有文件中获取数据，然后根据需要处理它们。像这样的事情可能会奏效。在

I've assumed that there's nothing after the coordinate values in the file. You will have to run this script from the directory all the files are in.

^{1}$

这将为您提供一个以ATOM ID为键、所有坐标列表为值的字典。输出示例：

^{pr2}$

在回顾这个问题后，我认为我误解了输入。在所有文件中，ITEM: ATOMS id x y z行是静态的。所以，我把代码改了一点。在

import os, re

regex = r"^ITEM: ATOMS id x y z.*" # basing on this line being exactly "ITEM: ATOMS id x y z"

output = {} # dictionary to store all coordinates

for file in os.listdir():
    if os.path.isfile(file):
        with open(file,'r') as f:
            data = f.readlines()
            matches = re.findall(regex,''.join(data),re.MULTILINE | re.DOTALL)
            temp = matches[0].split('\n')
            output[file] = temp[1:] # storing against filename as key

相关问题更多 >

编程相关推荐

热门问题

热门文章