<p>我有两个文件:</p>
<pre><code>efile = c:\myexternal.txt
cfile = c:\mycurrent.txt
</code></pre>
<p>你知道吗myexternal.txt文件地址:</p>
<pre><code>Paris
London
Amsterdam
New York
</code></pre>
<p>你知道吗mycurrent.txt文件(但可以是任何文本):</p>
<pre><code>Paris is a city in France
A city in the UK is London
In the USA there is no city named Manchester
Amsterdam is in the Netherlands
</code></pre>
<p>我要做的是对externalfile(原始文本)中的每一行在当前文件中进行搜索,但使用regex边界:</p>
<p>体育课:<br/>
我想从currentfile的externalfile中查找所有城市,但不是前面有“is”的城市,所有城市都必须在cityname后面有空格,或者必须在行的末尾:</p>
<pre><code>boundO = "(?<!is\s)"
boundC = "(?=\s|$)"
#boundO + line in externalfile + boundC
#(regex rawtext regex)
#put every line of external file (c:\myexternal.txt) in list:
externalfile=[]
with open(efile, 'r+', encoding="utf8") as file:
for line in file:
if line.strip(): #if line != empty
line=line.rstrip("\n") #remove linebreaks
line=boundO + line + boundC #add regex bounderies
externalfile.append(line)
results = []
#check every line in c:\mycurrent.txt
with open(cfile, 'r+', encoding="utf8") as file:
for line in file:
if any(ext in line for ext in externalfile):
results.append(line)
</code></pre>
<p>这不起作用:<br/>
边界不被视为正则表达式。你知道吗</p>
<p>我做错什么了?你知道吗</p>
<p>正则表达式在使用前需要编译。你知道吗</p>
<pre><code>ext in line
</code></pre>
<p>仅在第行中找到字符串ext时进行测试</p>
<p>您应该使用以下内容:</p>
<pre><code>import re
regc=re.compile(ext)
regc.search(line)
</code></pre>