正则表达式匹配基于单词的第一项

2024-05-20 18:43:10 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是我想解析的字符串

a='   //TS_START
    /*TG_HEADER_START
        title="XYX"
        ident=""
    */
    /*
    <TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>
    <TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>
    */
    testcase TC_GEEA2_VGM_DOIP_01(char strDescription[], char strReq[], char strParams[])
    {
     }
    /*TG_HEADER_END*/




    zd.a.S,D.,AS'
    A/S,D/.A.SD./
    //<TS_END>'

我喜欢解析这个字符串并得到一个从<TC_HEADER_START>开始到</TC_HEADER_END>结束的字符串列表。我试过编写下面的正则表达式,它匹配all而不是第一个匹配。你知道吗

aa=re.findall(r'<TC_HEADER_START>([\s\S]*)</TC_HEADER_END>',a)

预期产量

aa=['<TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>','<TC_HEADER_START>
        title=" Halted after Tester Connect" 
        ident="TC1" 
        variants="A C" 
        name="TC">
        TestcaseDescription= This >
        TestcaseRequirements=36978
        StakeholderRequirements=1236                
        TestcaseParameters:
        TS_Implemented=Yes;
        TS_Automation=Automated;
        TS_Techniques= Testing;
        TS_Priority=1;
        TS_Tested_By=qz9ghv;
        TS_Review_done=Yes;
        TS_Regression=No
        TestcaseTestType=Test  
    </TC_HEADER_END>']

Tags: nametitleconnectstartyesendheadertc
2条回答

re.M , re.S _> https://docs.python.org/3/library/re.html?highlight=re.S#re.MULTILINE

import re

aa=re.findall(r'<TC_HEADER_START>(.*?)</TC_HEADER_END>',a,re.S)
print(len(aa))
print(aa[0])

输出:

2

    title=" Halted after Tester Connect" 
    ident="TC1" 
    variants="A C" 
    name="TC">
    TestcaseDescription= This >
    TestcaseRequirements=36978
    StakeholderRequirements=1236                
    TestcaseParameters:
    TS_Implemented=Yes;
    TS_Automation=Automated;
    TS_Techniques= Testing;
    TS_Priority=1;
    TS_Tested_By=qz9ghv;
    TS_Review_done=Yes;
    TS_Regression=No
    TestcaseTestType=Test  

您的正则表达式几乎是正确的-您希望使用一个懒惰的量词(*?),而不是贪婪的量词(*)。你知道吗

试试这个:

<TC_HEADER_START>([\s\S]*?)</TC_HEADER_END>

或者试试regex101

编辑:

如果要包含封闭标记,请将它们也包装到捕获组中:

(<TC_HEADER_START>)([\s\S]*?)(</TC_HEADER_END>)

updated regex101

相关问题 更多 >