<p>罗布是对的。上述解决方案存在问题。在</p>
<p>并且,上面Rob给出的基于正则表达式的解决方案
对我来说是个不错的选择。在</p>
<p>这里有一个变化:</p>
<pre><code>def save_lines(infile, outfile):
bracket_pattern = re.compile(r'{{(.*?)}}', re.DOTALL)
content = infile.read()
for mo in bracket_pattern.finditer(content):
outchars = mo.group(1)
outfile.write('matched: "{}" at position {}\n'.format(
outchars, mo.start()))
</code></pre>
<p>但是,根据您的需要,您可能还需要考虑
以下内容:(1)基于正则表达式的方法提供了
语法错误检查的灵活性很小。(2) 常规
表达式不支持递归语法,也就是说,如果
你需要解析的语法(我们讨论的是解析
问题)这里包含或扩展为包含嵌套语法
元素,正则表达式将不会有帮助。在</p>
<p>这是另一种基于有限状态机的解决方案。它
可能会为错误报告提供更多的灵活性。但是,是的
更长更复杂。这种复杂性是有代价的:(1)
开发时间(上面的正则表达式解决方案花了我很多时间
10~15分钟;这个FSM解决方案花了我几个小时);以及(2)
调试(这里有很多逻辑,大部分是if语句)
有很多方法会出错。在</p>
<p>因为它是基于有限状态机的,所以也不能扩展
(毫不困难地)支持处理嵌套的语法
(递归)构造。为此,您可能需要查看一个解析器
发电机。查看此列表:
<a href="https://wiki.python.org/moin/LanguageParsing" rel="nofollow">https://wiki.python.org/moin/LanguageParsing</a></p>
<p>从积极的一面来看,因为下面的代码是基于FSM的,
你可以画一个状态转换图来说明
假定代码在任何给定的情况下都可以接受(例如,只是
看到一个左大括号,里面有一个大括号
右花括号等)。在纸上,我把那张图画成
有向图(圆表示状态,圆之间的箭头表示
转换)。我觉得我做不到ascii艺术,所以
这是一个状态转换图的文本表示
可能看起来像:</p>
^{pr2}$
<p>代码如下:</p>
<pre><code>#!/usr/bin/env python
"""
Synopsis:
Search for and write out text content occuring between '{{' and '}}'.
Usage:
python capture.py <infilename>
Args:
1. Input file name
Options:
None
Example:
python capture.py some_file.txt
"""
import sys
(
ST_start,
ST_seen_left_bracket,
ST_inside_brackets,
ST_seen_right_bracket,
ST_outside_brackets,
ST_end,
) = range(1, 7)
Left_bracket = '{'
Right_bracket = '}'
class ReaderWriter(object):
def __init__(self, infile, outfile):
self.infile = infile
self.outfile = outfile
self.line = ''
self.pos = 0
self.inchar = None
self.prevchar = None
self.char_count = 0
def get_char(self):
if self.pos >= len(self.line):
self.line = self.infile.readline()
if not self.line:
return None
self.pos = 0
self.prevchar = self.inchar
inchar = self.line[self.pos]
self.inchar = inchar
self.pos += 1
self.char_count += 1
return inchar
def write(self, outchars):
#self.outfile.write('found: "{}"\n'.format(outchar))
self.outfile.write(outchars)
def write_prev_char(self):
#self.outfile.write('found: "{}"\n'.format(self.prevchar))
self.outfile.write(self.prevchar)
def save_lines(infile, outfile):
state = ST_start
while True:
if state == ST_start:
reader_writer = ReaderWriter(infile, outfile)
inchar = reader_writer.get_char()
state = ST_outside_brackets
elif state == ST_outside_brackets:
if inchar == Left_bracket:
inchar = reader_writer.get_char()
state = ST_seen_left_bracket if inchar is not None else ST_end
else:
inchar = reader_writer.get_char()
state = ST_outside_brackets if inchar is not None else ST_end
elif state == ST_seen_left_bracket:
if inchar == Left_bracket:
reader_writer.write('found (pos {:d}): "'.format(
reader_writer.char_count))
inchar = reader_writer.get_char()
state = ST_inside_brackets if inchar is not None else ST_end
else:
inchar = reader_writer.get_char()
state = ST_outside_brackets if inchar is not None else ST_end
elif state == ST_inside_brackets:
if inchar == Right_bracket:
inchar = reader_writer.get_char()
state = ST_seen_right_bracket if inchar is not None else ST_end
else:
reader_writer.write(inchar)
inchar = reader_writer.get_char()
state = ST_inside_brackets if inchar is not None else ST_end
elif state == ST_seen_right_bracket:
if inchar == Right_bracket:
reader_writer.write('"\n')
inchar = reader_writer.get_char()
state = ST_outside_brackets if inchar is not None else ST_end
else:
reader_writer.write_prev_char()
reader_writer.write(inchar)
inchar = reader_writer.get_char()
state = ST_inside_brackets if inchar is not None else ST_end
elif state == ST_end:
return
else:
pass
def main():
args = sys.argv[1:]
if len(args) != 1:
sys.exit(__doc__)
if args[0] == '-h' or args[0] == ' help':
print __doc__
sys.exit()
infilename = args[0]
infile = open(infilename, 'r')
save_lines(infile, sys.stdout)
infile.close()
if __name__ == '__main__':
#import ipdb
#ipdb.set_trace()
main()
</code></pre>