markdownlike语言解析器的实现

newline := "\r\n"/"\n"/"\r" indent := ("\r\n"/"\n"/"\r"), [ \t] number := [0-9]+ whitespace := [ \t]+ symbol_mark := [*_>#`%] symbol_mark_noa := [_>#`%] symbol_mark_nou := [*>#`%] symbol_mark_nop := [*_>#`] punctuation := [\(\)\,\.\!\?] noaccent_code := -(newline / '`')+ accent_code := -(newline / '``')+ symbol := -(whitespace / newline) text := -newline+ safe_text := -(newline / whitespace / [*_>#`] / '%%' / punctuation)+/whitespace link := 'http' / 'ftp', 's'?, '://', (-[ \t\r\n<>`^'"*\,\.\!\?]/([,\.\?],?-[ \t\r\n<>`^'"*]))+ strikedout := -[ \t\r\n*_>#`^]+ ctrlw := '^W'+ ctrlh := '^H'+ strikeout := (strikedout, (whitespace, strikedout)*, ctrlw) / (strikedout, ctrlh) strong := ('**', (inline_nostrong/symbol), (inline_safe_nostrong/symbol_mark_noa)* , '**') / ('__' , (inline_nostrong/symbol), (inline_safe_nostrong/symbol_mark_nou)*, '__') emphasis := ('*',?-'*', (inline_noast/symbol), (inline_safe_noast/symbol_mark_noa)*, '*') / ('_',?-'_', (inline_nound/symbol), (inline_safe_nound/symbol_mark_nou)*, '_') inline_code := ('`' , noaccent_code , '`') / ('``' , accent_code , '``') inline_spoiler := ('%%', (inline_nospoiler/symbol), (inline_safe_nop/symbol_mark_nop)*, '%%') inline := (inline_code / inline_spoiler / strikeout / strong / emphasis / link) inline_nostrong := (?-('**'/'__'),(inline_code / reference / signature / inline_spoiler / strikeout / emphasis / link)) inline_nospoiler := (?-'%%',(inline_code / emphasis / strikeout / emphasis / link)) inline_noast := (?-'*',(inline_code / inline_spoiler / strikeout / strong / link)) inline_nound := (?-'_',(inline_code / inline_spoiler / strikeout / strong / link)) inline_safe := (inline_code / inline_spoiler / strikeout / strong / emphasis / link / safe_text / punctuation)+ inline_safe_nostrong := (?-('**'/'__'),(inline_code / inline_spoiler / strikeout / emphasis / link / safe_text / punctuation))+ inline_safe_noast := (?-'*',(inline_code / inline_spoiler / strikeout / strong / link / safe_text / punctuation))+ inline_safe_nound := (?-'_',(inline_code / inline_spoiler / strikeout / strong / link / safe_text / punctuation))+ inline_safe_nop := (?-'%%',(inline_code / emphasis / strikeout / strong / link / safe_text / punctuation))+ inline_full := (inline_code / inline_spoiler / strikeout / strong / emphasis / link / safe_text / punctuation / symbol_mark / text)+ line := newline, ?-[ \t], inline_full? sub_cite := whitespace?, ?-reference, '>' cite := newline, whitespace?, '>', sub_cite*, inline_full? code := newline, [ \t], [ \t], [ \t], [ \t], text block_cite := cite+ block_code := code+ all := (block_cite / block_code / line / code)+

1条回答

网友

1楼 · 发布于 2024-09-28 05:22:34

如果一个事物包含另一个事物，那么通常您将它们作为单独的标记处理，然后将它们嵌套在语法中。Lepl（http://www.acooke.org/lepl）和PyParsing（可能是最流行的纯Python解析器）都允许递归地嵌套东西。在

所以在Lepl中，您可以编写类似以下内容的代码：

# these are tokens (defined as regexps)
stg_marker = Token(r'\*\*')
emp_marker = Token(r'\*') # tokens are longest match, so strong is preferred if possible
spo_marker = Token(r'%%')
....
# grammar rules combine tokens
contents = Delayed() # this will be defined later and lets us recurse
strong = stg_marker + contents + stg_marker
emphasis = emp_marker + contents + emp_marker
spoiler = spo_marker + contents + spo_marker
other_stuff = .....
contents += strong | emphasis | spoiler | other_stuff # this defines contents recursively

然后您可以看到，我希望，内容将如何匹配强、强调等嵌套用法

对于您的最终解决方案，还有很多工作要做，效率在任何纯Python解析器中都可能是一个问题（有些解析器是用C实现的，但可以从Python调用）。这些会更快，但可能更难使用；我不能推荐任何一个，因为我没有使用过它们）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章