<p>您可以尝试使用<code>tokenize</code>而不是<code>regex</code>,正如@OlvinRoght所说,在这种情况下,使用regex解析代码可能是个坏主意。如您所见<a href="https://stackoverflow.com/questions/34511673/extracting-comments-from-python-source-code/34512388#34512388">here</a>,您可以尝试以下方法来检测注释:</p>
<pre><code>import tokenize
fileObj = open('yourpath\comment.py', 'r')
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
# we can also use token.tok_name[toktype] instead of 'COMMENT'
# from the token module
if toktype == tokenize.COMMENT:
print('COMMENT' + " " + tok)
</code></pre>
<p>输出:</p>
<pre><code>COMMENT # -*- coding: utf-8 -*-
COMMENT # this is comment line.
COMMENT # comment in line
COMMENT # comment. there's a # in code.
COMMENT # strange sign ' # ' in comment.
</code></pre>
<p>然后,为了获得预期的结果,即不带注释的python文件,您可以尝试以下方法:</p>
<pre><code>nocomments=[]
for toktype, tok, start, end, line in tokenize.generate_tokens(fileObj.readline):
if toktype != tokenize.COMMENT:
nocomments.append(tok)
print(' '.join(nocomments))
</code></pre>
<p>输出:</p>
<pre><code> age = 18
msg1 = "I'm #1."
msg2 = 'you are #2. ' + 'He is #3'
print ( 'Waiting your answer' )
</code></pre>