用正则表达式提取部分文本

2024-09-30 20:30:57 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用正则表达式从以下文本中提取'*Node\n''*Element, type=S4R\n'之间的行

text ="""**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
**"""

我尝试了re.findall(r"\*Node\s([\s\S]+)\*\w", text)re.findall(r"(?<=\*Node\s)([\s\S]+)(?=\*)", text),但无法过滤文本的结尾部分。我得到了输出:

['      1,         0.25,          0.5,         0.75\n      2,         0.25,           0.,         0.75\n   1416,  0.200000003,           0., 0.0500000007\n*Element, type=S4R\n 1,   1,  21, 357,  46\n 2,  21,  22, 358, 357\n*Nset, nset=_PickedSet24, internal, generate\n    1,  1416,     1\n*']

但是,如果我尝试re.findall(r"(?<=name\s)([\s\S]+)(?=\selon)", text1)&re.findall(r"name\s([\s\S]+)\selon", text1)对于下面的代码,我得到了所需的['isn,t']

text1 = """my name isn,t\nelon *nestla"""

编辑 全文如下,有多个这样的补丁要提取,我可以用*元素结束补丁

text = """** PARTS
**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
*End Part
**  
*Part, name=plate#Part-1
*Node
      1, -0.449999988, -0.477499992,           0.
      2, -0.400000006, -0.477499992,           0.
    121, 0.0500000007, 0.0225000009,           0.
*Nset, nset=_PickedSet2, internal, generate
   1,  121,    1
*End Part
**  
**""" 

Tags: textnamerenodetypeelementgenerateinternal
2条回答
import re

text ="""**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
**"""

print( re.search(r'^\*Node(.*?)^\*Element, type=S4R', text, flags=re.S|re.M).group(1) )

印刷品:

      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007

您可以更具体一些,在它后面添加换行符和匹配\*Element, type=S4R

\*Node\r?\n([\s\S]+?)\r?\n\*Element, type=S4R

Regex demo

在没有不必要的回溯的情况下,您还可以使用*Node开始匹配,并使用负前瞻性匹配所有不以*Element开始的行

^\*Node\r?\n((?:(?!\*Element).*\r?\n)*)\*Element, type=S4R

Regex demoPython demo

import re

regex = r"^\*Node\r?\n((?:(?!\*Element).*\r?\n)*)\*Element, type=S4R"
text = ("**\n"
    "*Part, name=Part-2\n"
    "*Node\n"
    "      1,         0.25,          0.5,         0.75\n"
    "      2,         0.25,           0.,         0.75\n"
    "   1416,  0.200000003,           0., 0.0500000007\n"
    "*Element, type=S4R\n"
    " 1,   1,  21, 357,  46\n"
    " 2,  21,  22, 358, 357\n"
    "*Nset, nset=_PickedSet24, internal, generate\n"
    "    1,  1416,     1\n"
    "**")

matches = re.search(regex, text, re.MULTILINE)
if matches:
    print(matches.group(1))

输出

      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007

如果要查找所有匹配项,还可以使用re.findall并以*结束匹配,一个单词字符\w,并使用.*匹配行的其余部分

import re
 
regex = r"^\*Node\r?\n((?:(?!\*\w).*\r?\n)*)\*\w.*"
text = """** PARTS
**
*Part, name=Part-2
*Node
      1,         0.25,          0.5,         0.75
      2,         0.25,           0.,         0.75
   1416,  0.200000003,           0., 0.0500000007
*Element, type=S4R
 1,   1,  21, 357,  46
 2,  21,  22, 358, 357
*Nset, nset=_PickedSet24, internal, generate
    1,  1416,     1
*End Part
**  
*Part, name=plate#Part-1
*Node
      1, -0.449999988, -0.477499992,           0.
      2, -0.400000006, -0.477499992,           0.
    121, 0.0500000007, 0.0225000009,           0.
*Nset, nset=_PickedSet2, internal, generate
   1,  121,    1
*End Part
**  
**""" 
 
print(re.findall(regex, text, re.MULTILINE))

输出

['      1,         0.25,          0.5,         0.75\n      2,         0.25,           0.,         0.75\n   1416,  0.200000003,           0., 0.0500000007\n', '      1, -0.449999988, -0.477499992,           0.\n      2, -0.400000006, -0.477499992,           0.\n    121, 0.0500000007, 0.0225000009,           0.\n']

相关问题 更多 >