PyParsing忽略换行符?

2024-10-03 06:30:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我要分析如下所示的git日志文件:

d2436fa AuthorName 2015-05-15 Commit Message
4    3    README.md

我期望的输出如下所示:

^{pr2}$

我的语法是:

hsh = Word(alphanums, exact=7)
author = OneOrMore(Word(alphas + alphas8bit + '.'))
date = Regex('\d{4}-\d{2}-\d{2}')
message = OneOrMore(Word(printables + alphas8bit))
count = Word(nums)
file = Word(printables)
blankline = LineStart() + LineEnd()

commit = hsh + Combine(author, joinString=' ', adjacent=False) + \
         date + Combine(message, joinString=' ', adjacent=False) + LineEnd()
changes = count + count + file + LineEnd()
check = commit ^ changes ^ blankline

我实际得到的输出是:

['d2436fa', 'AuthorName', '2015-05-15', 'Commit Message 4 3 README.md']

为什么新行被忽略了?我以为这就是LineEnd()的作用?当我在'\n'上分开时,一切都很好:/


Tags: messagedatecountmdreadmewordauthorcommit
1条回答
网友
1楼 · 发布于 2024-10-03 06:30:44

pyparsing有一个(有争议的?)rule关于语法中的空白:

During the matching process, whitespace between tokens is skipped by default (although this can be changed)

而且,正如它所说,它是可以改变的。您可以通过执行以下操作来设置pp认为是空白的内容:

i_consider_whitespaces_to_be_only = ' '
ParserElement.setDefaultWhitespaceChars(i_consider_whitespaces_to_be_only)

(这将告诉它只使用空格,而不是换行符;当然,您也可以添加其他内容,例如制表符。)

相关问题 更多 >