<p>以下是<a href="http://pyparsing.wikispaces.com" rel="nofollow">pyparsing</a>的示例:</p>
<pre><code>import pyparsing as pp
import re
txt='''normalcontent1-(header1:content1(note1, note2),content2(note3),content3)-(header2:content)-normalcontent2-(header3)
normalcontent1-(header:content)-normalcontent2-normalcontent3-(header2:content2)'''
def DashSplit(txt):
''' Replicate the function of str.split(',') but do not split on nested expressions or in quoted strings'''
com_lok=[]
dash = pp.Suppress('-')
# note the location of each dash outside an ignored expression:
dash.setParseAction(lambda s, lok, toks: com_lok.append(lok))
ident = pp.Word(pp.alphas+"_", pp.alphanums+"_") # python, C type identifier
exp=(pp.nestedExpr()) # Ignore everthing inside nested '( )'
atom = ident | exp
expr = pp.OneOrMore(atom) + pp.ZeroOrMore(dash + atom )
try:
result=expr.parseString(txt)
except pp.ParseException as e:
print('nope', e)
return [txt]
else:
return [txt[st:end] for st,end in zip([0]+[e+1 for e in com_lok],com_lok+[len(txt)])]
def headerGetter(txt):
m=re.match(r'\((\w+)', txt)
if m:
return '('+re.match(r'\((\w+)', txt).group(1)+')'
else:
return txt
for line in txt.splitlines():
print('-'.join(headerGetter(e) for e in DashSplit(line)))
</code></pre>
<p>印刷品:</p>
<pre><code>normalcontent1-(header1)-(header2)-normalcontent2-(header3)
normalcontent1-(header)-normalcontent2-normalcontent3-(header2)
</code></pre>
<p>如果正确定义语法,解析器将是比正则表达式更健壮的解决方案。你知道吗</p>