使用python从备用行提取数据

2024-09-30 07:33:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从以下文件格式中提取对应于O2H的数字(此处使用的分隔符是空格):

# Timestep     No_Moles     No_Specs     SH2    S2H4    S4H6    S2H2    H2  S2H3    OSH2    Mo1250O3736S57H111  OSH S3H6    OH2 S3H4    O2S SH  OS2H3   
144500       3802         15     3639    113     1   10  18  2   7   1   3   2   1   2   1   1   1  
# Timestep     No_Moles     No_Specs     SH2    S2H4    S2H2    H2  S2H3    OSH2    Mo1250O3733S61H115  OS2H2   OSH S3H6    OS  O2S2H2  OH2 S3H4    SH  
149000       3801         15     3634    114     11  18  2   7   1   1   2   2   1   1   4   2   1  
# Timestep     No_Moles     No_Specs     SH2    OS2H3   S3H Mo1250O3375S605H1526    OS  S2H4    O3S3H3  OSH2    OSH S2H2    H2  OH2 OS2H2   S2H O2S3H3  SH  O4S4H4  OH  O2S2H   O6S5H3  O6S5H5  O3S4H4  O2S3H2  O3S4H3  OS3H3   O3S2H2  O4S3H4  O3S3H   O6S4H5  OS4H3   O3S2H   O5S4H4  OS2H    O2SH2   S2H3    O4S3H3  O3S3H4  O   O5S3H4  O5S3H3  OS3H4   O2S4H4  O4S4H3  O2SH    O2S2H2  O5S4H5  O3S3H2  S3H6    
589000       3269         48     2900    11  1   1   47  11  1   81  74  26  25  21  17  1   3   5   2   3   3   1   1   2   2   1   2   1   1   1   1   1   1   1   3   3   1   1   1   1   1   1   1   1   1   1   1   1   1   1  
# Timestep     No_Moles     No_Specs     SH2    Mo1250O3034S578H1742    OH2 OSH2    O3S3H5  OS2H2   OS  OSH O2S3H2  OH  O3S2H2  O6S6H4  SH  O2S2H2  S2H2    OS2H    H2  OS2H3   O5S4H2  O7S6H5  S3H2    O2SH2   OSH3    O7S6H4  O2S2H3  O6S5H3  O2SH    O4S4H   O3S2H3  S2  O2S2H   S5H3    O7S4H4  O3S3H   OS3H    OS4H    O5S3H3  S3H O17S12H9    O3S3H2  O7S5H4  O4SH3   O3S2H   O7S8H4  O3S3H3  O11S9H6 OS3H2   S4H2    O10S8H6 O4S3H2  O5S5H4  O6S8H4  OS2 OS3H6   S3H3    
959500       3254         55     2597    1   83  119     1   46  59  172     4   3   4   1   27  7   38  6   23  3   1   2   3   5   3   1   2   1   2   1   1   6   3   1   1   2   1   1   1   1   1   3   1   1   2   1   1   1   1   1   1   2   1   1   1   1   1  

也就是说,所有备用行都包含其上一行的相应数据

我希望输出像这样

1
4
21
83

工作原理:

1 (14th number on 2nd row which corresponds to 14th word of 1st row i.e. O2H)
4 (16th number on 4th row which corresponds to 16th word of 3rd row i.e. O2H)
21 (15th number on 6th row which corresponds to 15th word of 5th row i.e. O2H)
83 (6th number on 8th row which corresponds to 6th word of 7th row i.e. O2H)

我试图用正则表达式提取它,但做不到。有人能帮我提取数据吗


Tags: nonumberonshh2rowspecsosh
3条回答

您可以轻松地将其解析为dataframe,并选择所需的列来获取值

假设您的数据与您提供的示例相似,您可以尝试以下操作:

import pandas as pd

with open("data.txt") as f:
    lines = [line.strip() for line in f.readlines()]

header = max(lines, key=len).replace("#", "").split()
df = pd.DataFrame([line.split() for line in lines[1::2]], columns=header)
print(df["OH2"])
df.to_csv("parsed_data.csv", index=False)

输出:

0     1
1    11
2     1
3    83
Name: OH2, dtype: object

将其转储到.csv将产生:

enter image description here

谢谢大家的帮助,我找到了解决办法

i=0
j=1
with open ('input.txt','r') as fin:
    with open ('output.txt','w') as fout:
        for lines in fin: #Iterating over each lines
            lists = lines.split() #Splits each line in list of words
            try:
                if i%2 == 0: #Odd lines
                    index_of_OH2 = lists.index('OH2')
                    #print(index_of_OH2)
                i=i+1
                if j%2 == 0: #Even lines
                    number_of_OH2 = lists[index_of_OH2-1]
                    print(number_of_OH2 + '\n')
                    fout.write(number_of_OH2 + '\n')
                j=j+1
            except:
                pass  

输出:

1
4
21
83

try:,除了:pass已添加,因此如果在该行中找不到OH2,它将无误地继续移动

我想你想要的是OH2而不是O2H,这是一个打字错误。假设:

(1)迭代每一行

(2)仅考虑偶数行。(if (line_counter % 2) == 0: continue

(3)拆分所有空间并使用一个计数器变量,在偶数行中计算OH2的索引。假设第一行是14

(4)访问下一行(+1索引)并拆分下一行的空格,访问第(3)点中元素索引处的元素

由于您没有发布任何代码,我认为您的问题更多的是找到实现这一点的方法,而不是编码,因此我为您编写了算法

相关问题 更多 >

    热门问题