Python中两个字符串的正则匹配

2024-05-08 19:54:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在包含如下内容的文件中匹配两个[Term][Term][Typedef]之间的所有匹配:

remark: Includes Ontology(OntologyID(OntologyIRI(<http://purl.obolibrary.org/obo/go/never_in_taxon.owl>))) [Axioms: 18 Logical Axioms: 0]
ontology: go

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

[Typedef]
id: positively_regulates
name: positively regulates
namespace: external
xref: RO:0002213
holds_over_chain: negatively_regulates negatively_regulates
is_a: regulates ! regulates
transitive_over: part_of ! part of

[Typedef]
id: regulates
name: regulates
namespace: external
xref: RO:0002211
is_transitive: true
transitive_over: part_of ! part of

使用:(?=\[Term\]\s)[\s\S]*(?=\s\s\[Term\]\s)我只匹配第一个[Term]和倒数第二个。在


Tags: andofthenameidgoisinheritance
2条回答

你可以用

r'(?m)^\[Term].*(?:\r?\n(?!\[(?:Typedef|Term)]).*)*'

参见regex demo

详细信息

  • (?m)-多行修饰符
  • ^-行首
  • \[Term]-a[Term]子串
  • .*-当前行的其余部分
  • (?:\r?\n(?!\[(?:Typedef|Term)]).*)*-0次或多次出现:
    • \r?\n(?!\[(?:Typedef|Term)])-不跟[Typedef][Term]子字符串的换行符(CRLF或LF)
    • .*-当前行的其余部分

Python code

^{pr2}$

输出:

[Term]
id: GO:0000001
name: mitochondrion inheritance
namespace: biological_process
def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764]
synonym: "mitochondrial inheritance" EXACT []
is_a: GO:0048308 ! organelle inheritance
is_a: GO:0048311 ! mitochondrion distribution

        Next match        -
[Term]
id: GO:0000002
name: mitochondrial genome maintenance
namespace: biological_process
def: "The maintenance of the structure and integrity of the mitochondrial genome; includes replication and segregation of the mitochondrial chromosome." [GOC:ai, GOC:vw]
is_a: GO:0007005 ! mitochondrion organization

        Next match        -
[Term]
id: GO:0000011
name: vacuole inheritance
namespace: biological_process
def: "The distribution of vacuoles into daughter cells after mitosis or meiosis, mediated by interactions between vacuoles and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:14616069]
is_a: GO:0007033 ! vacuole organization
is_a: GO:0048308 ! organelle inheritance

        Next match        -
Number of mathes: 3

要在两者之间进行匹配,可以尝试以下操作:

import re
s = "[Term] id: GO:0000001 name: mitochondrion inheritance namespace: biological_process def: "The distribution of mitochondria, including the mitochondrial genome, into daughter cells after mitosis or meiosis, mediated by interactions between mitochondria and the cytoskeleton." [GOC:mcc, PMID:10873824, PMID:11389764] synonym: "mitochondrial inheritance" EXACT [] is_a: GO:0048308 ! organelle inheritance is_a: GO:0048311 ! mitochondrion distribution" #etc
the_data = re.findall("\[Term\](.*?)\n\s\[Term\]|\[Term\](.*?)\n\s\[Typedef\]", s)

最终输出:

^{pr2}$

相关问题 更多 >

    热门问题