如何编写正确的regex格式来查找和替换fi中的行

2024-09-30 01:31:54 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个文件,它们看起来都像(除了两个文件中以PBUSH开头的每行K后面的数据):

$ Elements and Element Properties for region : RFAST_BUSH_CID_1.a.r1.r1.    
$ .r1    
PBUSH    9       K      435008.         522649. 6.8198+6        8.1938+6    
                 RCV    1.      1.      1.      1.    
$ Pset: "RFAST_BUSH_CID_1.a.r1.r1..r1" will be imported as: "pbush.9"    
CBUSH    1207216 9       1014816 100670                          1
        .5     
$ Elements and Element Properties for region : RFAST_BUSH_CID_1.b.r1.r1.    
$ .r1    
PBUSH    10      K      319265.         148977. 988690.         461348.    
                 RCV    1.      1.      1.      1.     
$ Pset: "RFAST_BUSH_CID_1.b.r1.r1..r1" will be imported as: "pbush.10"    
CBUSH    1207615 10      1016116 800007                          1 
        .5    
$ Elements and Element Properties for region : RFAST_BUSH_CID_12.r1.r1.r    
PBUSH    11      K      311773.         341027. 2.4204+6        2.6475+6    
                 RCV    1.      1.      1.      1.    
$ Pset: "RFAST_BUSH_CID_12.r1.r1.r" will be imported as: "pbush.11"    
CBUSH    1208216 11      1017412 100781                          0
        .5    
$ Elements and Element Properties for region : pbush.6284.r1.r1.r1.r1.r1    
PBUSH    6284    K      496800.         496799. 9.6155+6        9.6154+6    
                 RCV    1.      1.      1.      1.     
$ Pset: "pbush.6284.r1.r1.r1.r1.r1" will be imported as: "pbush.6284"    
CBUSH    1206132 6284    1012231 101532                          1
        .5    
$ Elements and Element Properties for region : pbush.6286.r1.r1.r1.r1.r1    
PBUSH    6286    K      496800.         496799. 9.6155+6        9.6154+6    
                 RCV    1.      1.      1.      1.   

因此,假设这是我的源文件,我需要从中搜索所有以PBUSH开始的行(保持不变),然后是空格,然后是数字(如图所示,数字一直在变化),并根据关键字“PBUSH Number”检查它是否存在于目标文件中。如果在目标文件中找到,则需要将“K”之后的数据从源文件替换到目标文件中找到数据的精确行。因此,它需要遍历源文件和目标文件中的所有行,直到结束。现在,我得到了以下代码:

import re
path1 = "C:\Users\sony\Desktop\PBUSH1.BDF"
path2 = "C:\Users\sony\Desktop\PBUSH2.BDF"

with open(path1) as f1, open(path2) as f2:
    dat1 = f1.read()
    dat2 = f2.read()

    matches = re.findall('^PBUSH\s+[0-9]\s+[0-9 ]+', dat1, flags=re.MULTILINE)
    for match in matches:
        dat2 = re.sub('^{}\s+[0-9]\s+'.format(match.split(' ')[0]), match, dat2, flags=re.MULTILINE)

with open(path2, 'w') as f:
    f.write(dat2)

我很难得到确切的输出,因为我正在寻找。我在findall和sub的格式似乎是错误的。我需要用小数吗?现在目标文件中没有任何更改。我不断地改变格式,以检查哪些适合这里。。你知道吗


Tags: and文件目标foraselementselementproperties
1条回答
网友
1楼 · 发布于 2024-09-30 01:31:54

一般情况下,人们会用pyNastran之类的代码来解析和编写BDF文件。你知道吗


然而,在这种特定情况下,使用您的方法并不是那么错误;尽管您的正则表达式是错误的,尽管这里的原理是有效的。另外请注意,您需要在路径中使用原始字符串或转义\;不推荐使用未转义的\,这可能会导致难以查找的错误。你知道吗

import re

# must use raw strings for paths, otherwise we need to
# escape \ characters
input1 = r"C:\Users\sony\Desktop\PBUSH1.BDF"
input2 = r"C:\Users\sony\Desktop\PBUSH2.BDF"

output = r"C:\Users\sony\Desktop\OUTPUT.BDF"

with open(path1) as f1, open(path2) as f2:
    dat1 = f1.read()
    dat2 = f2.read()

# use finditer instead of findall so that we will get 
# a match object for each match.
#
# For each matching line we also have one subgroup, containing the
# "PBUSH   NNN     " part, whereas the whole regex matches until
# the next end of line
matches = re.finditer('^(PBUSH\s+[0-9]+\s+).*$', dat1, flags=re.MULTILINE)

for match in matches:
    # for each match we construct a regex that looks like
    # "^PBUSH   123      .*$", then replace all matches thereof
    # with the contents of the whole line
    dat2 = re.sub('^{}.*$'.format(match.group(1)), match.group(0), dat2, flags=re.MULTILINE)

with open(output) as outf:
    outf.write(dat2)

相关问题 更多 >

    热门问题