在另一个字符串中使用部分正则表达式和部分非正则表达式搜索文本

2024-10-02 00:20:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个文件:

efile = c:\myexternal.txt    
cfile = c:\mycurrent.txt

你知道吗myexternal.txt文件地址:

Paris
London
Amsterdam
New York

你知道吗mycurrent.txt文件(但可以是任何文本):

Paris is a city in France
A city in the UK is London
In the USA there is no city named Manchester
Amsterdam is in the Netherlands

我要做的是对externalfile(原始文本)中的每一行在当前文件中进行搜索,但使用regex边界:

体育课:
我想从currentfile的externalfile中查找所有城市,但不是前面有“is”的城市,所有城市都必须在cityname后面有空格,或者必须在行的末尾:

boundO = "(?<!is\s)"
boundC = "(?=\s|$)"
#boundO + line in externalfile + boundC
#(regex rawtext regex)

#put every line of external file (c:\myexternal.txt) in list:
externalfile=[]
with open(efile, 'r+', encoding="utf8") as file:
  for line in file:
      if line.strip():                 #if line != empty
          line=line.rstrip("\n")       #remove linebreaks
          line=boundO + line + boundC  #add regex bounderies
          externalfile.append(line)

results = []
#check every line in c:\mycurrent.txt
with open(cfile, 'r+', encoding="utf8") as file:
  for line in file:
      if any(ext in line for ext in externalfile):
          results.append(line)

这不起作用:
边界不被视为正则表达式。你知道吗

我做错什么了?你知道吗


Tags: 文件theintxtcityforisline
3条回答

正则表达式在使用前需要编译。你知道吗

ext in line 

仅在第行中找到字符串ext时进行测试

您应该使用以下内容:

import re
regc=re.compile(ext)
regc.search(line)

你需要re.search。使用

with open("check.pl", 'r+') as file:
    for line in file:
        if any(re.search(ext, line) for ext in externalfile): # < -here
            print(line)
            results.append(line)

输出

Paris is a city in France

Amsterdam is in the Netherlands
[Finished in 0.0s]

编辑

我不确定,但是,看看这个

boundO = "(?<!is\s)\\b"
boundC = "(?=\s|$)"
#boundO + line in externalfile + boundC
#(regex rawtext regex)

#put every line of external file (c:\myexternal.txt) in list:
externalfile=[]
with open("check", 'r+') as file:
  for line in file:
      if line.strip():                 #if line != empty
          line=line.rstrip("\n")       #remove linebreaks
          #line=boundO + line + boundC  #add regex bounderies
          externalfile.append(line)

results = []
print(externalfile)
#check every line in c:\mycurrent.txt
with open("check.pl", 'r+') as file:
    for line in file:
        if any(re.search(boundO + ext + boundC, line) for ext in externalfile):
            print(line)
            results.append(line)

必须使用^{}而不是in运算符:

if any(re.search(ext, line) for ext in externalfile):

并且,要防止文件中的文本被解释为regex,请使用re.escape

line= boundO + re.escape(line) + boundC  #add regex bounderies

相关问题 更多 >

    热门问题