pyparsing空白匹配问题

ParserElement.setDefaultWhitespaceChars('\r\n\t ') source = "Library\tSSHClient with name\tnode" EACH_LINE = Optional(Word(" ")).leaveWhitespace().suppress() + \ CaselessKeyword("library").suppress() + \ OneOrMore((Word(alphas)) + White(max=1).setResultName('myValue')) +\ SkipTo(LineEnd()) res = EACH_LINE.parseString(source) print res.myValue

from pyparsing import * source = "Library\tsshclient\t\t\twith name s1" value = Combine(OneOrMore(Word(printables) | White(' ', max=1) + ~White())) #here it seems the whitespace has already been set to ' ', why the result still match '\t'? linedefn = OneOrMore(value) res = linedefn.parseString(source) print res

1条回答

网友

1楼 · 发布于 2024-10-01 09:23:31

当空格潜入已解析的标记中时，我总是畏缩，但是由于您的限制，只允许单个空格，这应该是可行的。我使用以下表达式定义可以嵌入单个空格的值：

# each value consists of printable words separated by at most a 
# single space (a space that is not followed by another space)
value = Combine(OneOrMore(Word(printables) | White(' ',max=1) + ~White()))

完成此操作后，一行就是这些值中的一个或多个：

^{pr2}$

以你为榜样，包括打电话结构更换要用空格对替换制表符，代码如下所示：

data = "Library\tSSHClient    with name\tnode"

# replace tabs with 2 spaces
data = data.replace('\t', '  ')

print linedefn.parseString(data)

给予：

['Library', 'SSHClient', 'with name', 'node']

要获取原始字符串中任何值的起始位置和结束位置，请将表达式包装在新的pyparsing helper方法locatedExpr：

# use new locatedExpr to get the value, start, and end location 
# for each value
linedefn = OneOrMore(locatedExpr(value))('values')

如果我们分析并转储结果：

print linedefn.parseString(data).dump()

我们得到：

- values: 
  [0]:
    [0, 'Library', 7]
    - locn_end: 7
    - locn_start: 0
    - value: Library
  [1]:
    [9, 'SSHClient', 18]
    - locn_end: 18
    - locn_start: 9
    - value: SSHClient
  [2]:
    [22, 'with name', 31]
    - locn_end: 31
    - locn_start: 22
    - value: with name
  [3]:
    [33, 'node', 37]
    - locn_end: 37
    - locn_start: 33
    - value: node

LineStart和LineEnd是pyparsing表达式类，它们的实例应该在行的开头和结尾处匹配。LineStart一直很难使用，但是LineEnd是可以预测的。在您的例子中，如果您一次只读取和解析一行，那么您不需要它们-只需定义您期望的行的内容。如果您想确保解析器已经处理了整个字符串（并且不会因为不匹配的字符而在结尾处停止），请将+ LineEnd()或{}添加到解析器的末尾，或者将参数parseAll=True添加到对parseString()的调用中。在

编辑：

很容易忘记pyparsing调用str.expandtabs结构默认情况下-必须通过调用parseWithTabs禁用此功能。这样，以及显式地不允许在值词之间使用制表符可以解决问题，并使值保持正确的字符数。参见以下更改：

from pyparsing import *
TAB = White('\t')

# each value consists of printable words separated by at most a 
# single space (a space that is not followed by another space)
value = Combine(OneOrMore(~TAB + (Word(printables) | White(' ',max=1) + ~White())))

# each line has one or more of these values
linedefn = OneOrMore(value)
# do not expand tabs before parsing
linedefn.parseWithTabs()


data = "Library\tSSHClient    with name\tnode"

# replace tabs with 2 spaces
#data = data.replace('\t', '  ')

print linedefn.parseString(data)


linedefn = OneOrMore(locatedExpr(value))('values')
# do not expand tabs before parsing
linedefn.parseWithTabs()
print linedefn.parseString(data).dump()

相关问题更多 >

编程相关推荐

热门问题

热门文章