RegExp根据python中的不同条件拆分字符串

2024-10-01 22:26:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用正则表达式拆分字符串。你知道吗

例如

when [python] or [html ] demo  "css html"   -[javascript] score:5

从我想要的这个字符串,下面的列表

contains = ['when', 'demo']
word_press = ["css html"]
tags = ['python', 'or', 'html', '-', 'javascript']
options = [{score:5}]
  • 包含在"[]"(括号)中的所有单词都将是标记列表。你知道吗
  • 这个""之间的单词将出现在单词列表中。你知道吗
  • 单词中有:的单词将在选项列表中。你知道吗
  • 其他上述标准将在包含列表中。你知道吗

我试过这个

((?:or\s|-)?\[.*?\])|(".*?")|([a-z]+:\d*)|(\S+)

live demo

它工作得很好,但我用python

>>> import re
>>> s = '''[python] or [html] how to "how to" user:2525
... [demo] how to createscore:5
... when [python] or [html] demo  "css html"   -[javascript] score:5'''
>>> re.findall('''((?:or\s|-)?\[.*?\])|(".*?")|([a-z]+:\d*)|(\S+)''', s)
[('[python]', '', '', ''),
 ('or [html]', '', '', ''),
 ('', '', '', 'how'),
 ('', '', '', 'to'),
 ('', '"how to"', '', ''),
 ('', '', 'user:2525', ''),
 ('[demo]', '', '', ''),
 ('', '', '', 'how'),
 ('', '', '', 'to'),
 ('', '', 'createscore:5', ''),
 ('', '', '', 'when'),
 ('[python]', '', '', ''),
 ('or [html]', '', '', ''),
 ('', '', '', 'demo'),
 ('', '"css html"', '', ''),
 ('-[javascript]', '', '', ''),
 ('', '', 'score:5', '')]

它返回列表中的元组。有没有一种方法来获取像这样的组

group1 = ['[python]', 'or [html]', '[demo]', '[python]', 'or [html]', '-[javascript]']
...

Tags: orto字符串re列表demohtmljavascript
1条回答
网友
1楼 · 发布于 2024-10-01 22:26:03
>>> import re
>>> s = '''[python] or [html] how to "how to" user:2525
[demo] how to createscore:5
when [python] or [html] demo  "css html"   -[javascript] score:5'''

下面是一个可能的正则表达式(包括内联注释),用于捕获所需的信息(请参见demohere):

>>> pattern = r'''
    (?P<tag>                 # define group one - tags
    (?:or\s|-)?              # - acceptable words/chars for preceding tags
    \[.*?\])                 # - tag definition - words in square brackets
    |(?P<word_press>".*?")   # group two - words in quotes
    |(?P<options>[a-z]+:\d*) # group three - options with colons
    |(?P<other>\S+)          # group four - anything left over
'''

请注意,将此与findall一起使用将提供一个元组列表:

>>> re.findall(pattern, s, re.VERBOSE)
[('[python]', '', '', ''),
 ('or [html]', '', '', ''),
 ('', '', '', 'how'), 
 ('', '', '', 'to'),
 ('', '"how to"', '', ''),
 ('', '', 'user:2525', ''), 
 ('[demo]', '', '', ''),
 ('', '', '', 'how'),
 ('', '', '', 'to'), 
 ('', '', 'createscore:5', ''),
 ('', '', '', 'when'),
 ('[python]', '', '', ''), 
 ('or [html]', '', '', ''), 
 ('', '', '', 'demo'), 
 ('', '"css html"', '', ''), 
 ('-[javascript]', '', '', ''), 
 ('', '', 'score:5', '')]

但这里有一个函数式编程方法来重新排列它:

>>> from functools import partial
>>> map(partial(filter, None), zip(*re.findall(pattern, s, re.VERBOSE)))
[('[python]', 'or [html]', '[demo]', '[python]', 'or [html]', '-[javascript]'), 
 ('"how to"', '"css html"'), 
 ('user:2525', 'createscore:5', 'score:5'), 
 ('how', 'to', 'how', 'to', 'when', 'demo')]

相关问题 更多 >

    热门问题