使用特定的正则表达式Python隔离文本块

2024-10-04 01:32:39 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个充满文本的文件,如下所示:

module combfn1789(clk, i0, i1, i2, i3, o);
  input clk, i0, i1, i2, i3; 
  output o;
  wire clk, i0, i1, i2, i3;
  wire o;
  wire UNCONNECTED788, n_0, n_1, n_2, n_3, n_4;
  Q_FDP0I0 o_reg(.CK (clk), .D (n_4), .Q (o), .QN (UNCONNECTED788));
  Q_OAI33 g186(.A0 (i2), .A1 (n_1), .A2 (i0), .B0 (n_0), .B1 (n_3), .B2
       (n_2), .Z (n_4));
  Q_INV g187(.A (i3), .Z (n_3));
  Q_INV g188(.A (i0), .Z (n_2));
  Q_INV g189(.A (i1), .Z (n_1));
  Q_INV g190(.A (i2), .Z (n_0));
endmodule;

module combfn1(clk, i0, i1, i2, i3, o);
  input clk, i0, i1, i2, i3;
  output o;
  wire clk, i0, i1, i2, i3;
  wire o;
  wire UNCONNECTED0, n_0, n_1;
  Q_FDP0I0 o_reg(.CK (clk), .D (n_1), .Q (o), .QN (UNCONNECTED0));
  Q_NR04 g59__4296(.A0 (i2), .A1 (i1), .A2 (n_0), .A3 (i3), .Z (n_1));
  Q_INV g60(.A (i0), .Z (n_0));
endmodule

我只对文本的一个子集感兴趣,因此我试图编写一个python程序来隔离以下内容:

combfn1789
Q_FDP0I0 o_reg(.CK (clk), .D (n_4), .Q (o), .QN (UNCONNECTED788));
Q_OAI33 g186(.A0 (i2), .A1 (n_1), .A2 (i0), .B0 (n_0), .B1 (n_3), .B2
      (n_2), .Z (n_4));
Q_INV g187(.A (i3), .Z (n_3));
Q_INV g188(.A (i0), .Z (n_2));
Q_INV g189(.A (i1), .Z (n_1));
Q_INV g190(.A (i2), .Z (n_0));

combfn1
Q_NR04 g59__4296(.A0 (i2), .A1 (i1), .A2 (n_0), .A3 (i3), .Z (n_1));
Q_INV g60(.A (i0), .Z (n_0));

我最初的想法是使用re.search隔离以Q_u开头的行。不幸的是,这不适用于隔离模块名combfn。我不知道如何编写一个正则表达式来隔离以Q_u和模块名开头的行


Tags: a2a1rega0ckqni3wire
1条回答
网友
1楼 · 发布于 2024-10-04 01:32:39

这个表达式或者它的一个修改版本可能会返回所需的输出,或者有点接近这个输出

module\s+\K([^)(]+)|(Q_[\s\S]*?;)

re.finditer

import re

regex = r"module\s+([^)(]+)|(Q_[\s\S]*?;)"

test_str = ("module combfn1789(clk, i0, i1, i2, i3, o);\n"
    "  input clk, i0, i1, i2, i3; \n"
    "  output o;\n"
    "  wire clk, i0, i1, i2, i3;\n"
    "  wire o;\n"
    "  wire UNCONNECTED788, n_0, n_1, n_2, n_3, n_4;\n"
    "  Q_FDP0I0 o_reg(.CK (clk), .D (n_4), .Q (o), .QN (UNCONNECTED788));\n"
    "  Q_OAI33 g186(.A0 (i2), .A1 (n_1), .A2 (i0), .B0 (n_0), .B1 (n_3), .B2\n"
    "       (n_2), .Z (n_4));\n"
    "  Q_INV g187(.A (i3), .Z (n_3));\n"
    "  Q_INV g188(.A (i0), .Z (n_2));\n"
    "  Q_INV g189(.A (i1), .Z (n_1));\n"
    "  Q_INV g190(.A (i2), .Z (n_0));\n"
    "endmodule;\n\n"
    "module combfn1(clk, i0, i1, i2, i3, o);\n"
    "  input clk, i0, i1, i2, i3;\n"
    "  output o;\n"
    "  wire clk, i0, i1, i2, i3;\n"
    "  wire o;\n"
    "  wire UNCONNECTED0, n_0, n_1;\n"
    "  Q_FDP0I0 o_reg(.CK (clk), .D (n_1), .Q (o), .QN (UNCONNECTED0));\n"
    "  Q_NR04 g59__4296(.A0 (i2), .A1 (i1), .A2 (n_0), .A3 (i3), .Z (n_1));\n"
    "  Q_INV g60(.A (i0), .Z (n_0));\n"
    "endmodule")

matches = re.finditer(regex, test_str, re.MULTILINE | re.IGNORECASE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

re.findall

import re

regex = r"module\s+([^)(]+)|(Q_[\s\S]*?;)"

test_str = ("module combfn1789(clk, i0, i1, i2, i3, o);\n"
    "  input clk, i0, i1, i2, i3; \n"
    "  output o;\n"
    "  wire clk, i0, i1, i2, i3;\n"
    "  wire o;\n"
    "  wire UNCONNECTED788, n_0, n_1, n_2, n_3, n_4;\n"
    "  Q_FDP0I0 o_reg(.CK (clk), .D (n_4), .Q (o), .QN (UNCONNECTED788));\n"
    "  Q_OAI33 g186(.A0 (i2), .A1 (n_1), .A2 (i0), .B0 (n_0), .B1 (n_3), .B2\n"
    "       (n_2), .Z (n_4));\n"
    "  Q_INV g187(.A (i3), .Z (n_3));\n"
    "  Q_INV g188(.A (i0), .Z (n_2));\n"
    "  Q_INV g189(.A (i1), .Z (n_1));\n"
    "  Q_INV g190(.A (i2), .Z (n_0));\n"
    "endmodule;\n\n"
    "module combfn1(clk, i0, i1, i2, i3, o);\n"
    "  input clk, i0, i1, i2, i3;\n"
    "  output o;\n"
    "  wire clk, i0, i1, i2, i3;\n"
    "  wire o;\n"
    "  wire UNCONNECTED0, n_0, n_1;\n"
    "  Q_FDP0I0 o_reg(.CK (clk), .D (n_1), .Q (o), .QN (UNCONNECTED0));\n"
    "  Q_NR04 g59__4296(.A0 (i2), .A1 (i1), .A2 (n_0), .A3 (i3), .Z (n_1));\n"
    "  Q_INV g60(.A (i0), .Z (n_0));\n"
    "endmodule")

print(re.findall(regex, test_str))

演示

该表达式在this demo的右上面板上进行了解释,如果您希望进一步探索或简化/修改它,那么在this link中,您可以观察它如何与一些示例输入逐步匹配(如果您愿意)

相关问题 更多 >