<p>有点晚了,但是google <code>pyparsing reentrancy</code>显示了这个主题,所以我的答案是。<br/>
我已经解决了解析器实例重用/可重入的问题,方法是将上下文附加到正在解析的字符串上。
子类<code>str</code>,将上下文放入新str类的属性中,
将它的一个实例传递给<code>pyparsing</code>,并在操作中获取上下文。在</p>
<p>Python 2.7:</p>
<pre><code>from pyparsing import LineStart, LineEnd, Word, alphas, Optional, Regex, Keyword, OneOrMore
# subclass str; note that unicode is not handled
class SpecStr(str):
context = None # will be set in spec_string() below
# override as pyparsing calls str.expandtabs by default
def expandtabs(self, tabs=8):
ret = type(self)(super(SpecStr, self).expandtabs(tabs))
ret.context = self.context
return ret
# set context here rather than in the constructor
# to avoid messing with str.__new__ and super()
def spec_string(s, context):
ret = SpecStr(s)
ret.context = context
return ret
class Actor(object):
def __init__(self):
self.namespace = {}
def pair_parsed(self, instring, loc, tok):
self.namespace[tok.key] = tok.value
def include_parsed(self, instring, loc, tok):
# doc = open(tok.filename.strip()).read() # would use this line in real life
doc = included_doc # included_doc is defined below
parse(doc, self) # <<<<< recursion
def make_parser(actor_type):
def make_action(fun): # expects fun to be an unbound method of Actor
def action(instring, loc, tok):
if isinstance(instring, SpecStr):
return fun(instring.context, instring, loc, tok)
return None # None as a result of parse actions means
# the tokens has not been changed
return action
# Sample grammar: a sequence of lines,
# each line is either 'key=value' pair or '#include filename'
Ident = Word(alphas)
RestOfLine = Regex('.*')
Pair = (Ident('key') + '=' +
RestOfLine('value')).setParseAction(make_action(actor_type.pair_parsed))
Include = (Keyword('#include') +
RestOfLine('filename')).setParseAction(make_action(actor_type.include_parsed))
Line = (LineStart() + Optional(Pair | Include) + LineEnd())
Document = OneOrMore(Line)
return Document
Parser = make_parser(Actor)
def parse(instring, actor=None):
if actor is not None:
instring = spec_string(instring, actor)
return Parser.parseString(instring)
included_doc = 'parrot=dead'
main_doc = """\
#include included_doc
ham = None
spam = ham"""
# parsing without context is ok
print 'parsed data:', parse(main_doc)
actor = Actor()
parse(main_doc, actor)
print 'resulting namespace:', actor.namespace
</code></pre>
<p>收益率</p>
^{pr2}$
<p>这种方法使<code>Parser</code>本身完全可重用和可重入。
只要不接触<code>pyparsing</code>的静态字段,<code>pyparsing</code>内部通常也是可重入的。
唯一的缺点是<code>pyparsing</code>在每次调用<code>parseString</code>时重置其packrat缓存,但这可以通过
重写<code>SpecStr.__hash__</code>(使其像<code>object</code>那样散列,而不是{<cd2>})和一些monkeypatching。在我的数据集中,这根本不是问题,因为性能损失可以忽略不计,这甚至有利于内存的使用。在</p>