正则表达式排除由引号和以%开头的行包围的匹配项

2024-06-24 13:13:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我想创建一个正则表达式,它可以执行以下操作:

  • 精确匹配各种单词,例如{addpathsaddpathtest
  • 排除以%符号开头的行
  • 排除被引号('"包围的匹配项

所以我提出了以下正则表达式(带有标志gm):

^[^%]*?(?<=[^\'\"])\b(addpaths|addpath|test)\b(?=[^\'\"]).*?$?

这给了我以下结果(见regex101):

function addpaths()                         --> match, correct
  % function addpaths to add paths to path  --> no match, correct
  fprintf('running addpaths')               --> no match, correct
  fprintf('addpaths running')               --> no match, correct
  fprintf('running addpaths.')              --> match, wrong
  fprintf('running addpaths function')      --> match, wrong

  % fprintf('running addpaths')             --> no match, correct
  % fprintf('addpaths running')             --> no match, correct
  % fprintf('running addpaths function')    --> no match, correct

  % test what happens to 'test'     --> no match, correct
  run('test')                       --> no match, correct
  'this is a test.'                 --> match, wrong
  test                              --> match, correct

所以正则表达式在一个完全匹配的单词紧挨着一个'的时候起作用,但当它旁边有另一个单词,空格或.的时候就不起作用了。为什么

import re

text = '''function addpaths()
  % function addpaths to add paths to path
  fprintf('running addpaths')
  fprintf('addpaths running')
  fprintf('running addpaths function')

  % fprintf('running addpaths')
  % fprintf('addpaths running')
  % fprintf('running addpaths function')

  % test what happens to 'test'
  run('test')
  'this is a test.'
  test
'''

pattern = '^[^%]*?(?<=[^\'\"])\\b(addpaths|addpath|test)\\b(?=[^\'\"]).*?$'
regex = re.compile(pattern, re.M)

matches = regex.findall(text)
for m in matches:
    print(m)

Tags: tonotestreaddmatchfunction单词
1条回答
网友
1楼 · 发布于 2024-06-24 13:13:56

试试这个:

import re


text = '''function addpaths()
  % function addpaths to add paths to path
  fprintf('running addpaths')
  fprintf('addpaths running')
  fprintf('running addpaths function')

  % fprintf('running addpaths')
  % fprintf('addpaths running')
  % fprintf('running addpaths function')

  % test what happens to 'test'
  run('test')
  'this is a test.'
  test'''

pattern = r"""^(?!\s*%)[^'\"]+?\b(addpaths|addpath|test)\b(?!.*?['\"]).*?$"""
regex = re.compile(pattern, re.M)

for line in text.split('\n'):
    print(line.ljust(50, ' '), regex.match(line) and 'OK' or 'NO MATCH')

输出:

function addpaths()                                OK
  % function addpaths to add paths to path         NO MATCH
  fprintf('running addpaths')                      NO MATCH
  fprintf('addpaths running')                      NO MATCH
  fprintf('running addpaths function')             NO MATCH
                                                   NO MATCH
  % fprintf('running addpaths')                    NO MATCH
  % fprintf('addpaths running')                    NO MATCH
  % fprintf('running addpaths function')           NO MATCH
                                                   NO MATCH
  % test what happens to 'test'                    NO MATCH
  run('test')                                      NO MATCH
  'this is a test.'                                NO MATCH
  test                                             OK

我使用negative lookahead(?!.*?['\"])是因为'this is a test.'在单词test后面有.,但是在你regex(addpaths|addpath|test)\b(?=[^\'\"])中,你排除了直接跟在引号后面的文本。这就是为什么这个run('test')不起作用的原因

相关问题 更多 >