Python,正则表达式,方括号内的字符串[]

2024-09-27 09:22:58 发布

您现在位置:Python中文网/ 问答频道 /正文

排成这样: (bla-表示不重要)

> blabla|blabla|bla|blabla| blabla [Geobacter sp. M21]
> blabla|blabla|bla|blabla| blabla [Acetobacter pasteurianus IFO 3283-07]
> blabla|blabla|bla|blabla| blabla [Gardnerella vaginalis ATCC 14019]
> blabla|blabla|bla|blabla| blabla [Granulibacter bethesdensis CGDNIH1]

我试着把所有的信息放在括号里[] 作为:

Geobacter sp. M21
Acetobacter pasteurianus IFO 3283-07
Gardnerella vaginalis ATCC 14019
Granulibacter bethesdensis CGDNIH1

我的代码在这里,当然它不工作-在[]中有时是3,有时是4个“alfanumeric单词”,还有像“.”这样的字符或“-”:

import re
#code...
pattern = r'[ \w+ \w+ \w+ ]'
for i in lines_:
    m = re.search ( pattern, str(i) )
    print m.group()

那么,是否可以使用正则表达式获取这些信息呢?你知道吗


Tags: 信息spblablablam21ifoatccvaginalis
3条回答

您可以将lines_传递给^{},并使用如下正则表达式模式:

\[([^\]]+)\]

下面是它所匹配内容的细分:

\[      # [
(       # The start of a capture group
[^\]]+  # One or more characters that are not ]
)       # The close of the capture group
\]      # ]

下面是一个演示:

>>> from re import findall
>>> lines_ = """
... > blabla|blabla|bla|blabla| blabla [Geobacter sp. M21]
... > blabla|blabla|bla|blabla| blabla [Acetobacter pasteurianus IFO 3283-07]
... > blabla|blabla|bla|blabla| blabla [Gardnerella vaginalis ATCC 14019]
... > blabla|blabla|bla|blabla| blabla [Granulibacter bethesdensis CGDNIH1]
... """
>>> findall("\[([^\]]+)\]", lines_)
['Geobacter sp. M21', 'Acetobacter pasteurianus IFO 3283-07', 'Gardnerella vaginalis ATCC 14019', 'Granulibacter bethesdensis CGDNIH1']
>>>

这里不需要正则表达式:

>>> s = '''> blabla|blabla|bla|blabla| blabla [Geobacter sp. M21]
... > blabla|blabla|bla|blabla| blabla [Acetobacter pasteurianus IFO 3283-07]
... > blabla|blabla|bla|blabla| blabla [Gardnerella vaginalis ATCC 14019]
... > blabla|blabla|bla|blabla| blabla [Granulibacter bethesdensis CGDNIH1]'''
>>> for x in s.splitlines():
...     print x.rsplit('[')[-1].rstrip(']')
...     
Geobacter sp. M21
Acetobacter pasteurianus IFO 3283-07
Gardnerella vaginalis ATCC 14019
Granulibacter bethesdensis CGDNIH1

最后我还是这样做了:

for i in list_:
    dop = re.search("\[(.+)\]$", str(i))
    if dop:
        species=dop.group(0)

说明:

\[      # [
(       # start of a capture group
.+      # One or more characters because some of them had brackets inside []
        # like > bla|bla [Salmonella enterica subsp. 4,[5],12:i:- str. 08-1736]
)       # The close of the capture group
\]      # ]
$       # matching from the end of line

谢谢大家的帮助

相关问题 更多 >

    热门问题