基于pyparsing的拆分

2024-10-04 09:23:15 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我想这样做(但是使用pyparsing)

Package:numpy11 Package:scipy
will be split into
[["Package:", "numpy11"], ["Package:", "scipy"]]

到目前为止我的代码是

package_header = Literal("Package:")
single_package =  Word(printables + " ") + ~Literal("Package:")
full_parser  = OneOrMore( pp.Group( package_header + single_package ) )

电流输出如下

([(['Package:', 'numpy11 Package:scipy'], {})], {})

我希望有这样的事

([(['Package:', 'numpy11'], {})], [(['Package:', 'scipy'], {})], {})

基本上,其余的文本匹配pp.printables

我知道我能用词,但我想用

all printables but not the Literal

我如何做到这一点?谢谢你


Tags: 代码packagescipybepyparsingwillppheader
1条回答
网友
1楼 · 发布于 2024-10-04 09:23:15

你不需要消极的前瞻,例如:

from pyparsing import *

package_header = Literal("Package:")
single_package =  Word(printables)
full_parser  = OneOrMore( Group( package_header + single_package ) )

print full_parser.parseString("Package:numpy11 Package:scipy")

印刷品:

[['Package:', 'numpy11'], ['Package:', 'scipy']]

更新:要解析由|分隔的包,可以使用delimitedList()函数(现在包名中也可以有空格):

from pyparsing import *

package_header = Literal("Package:")
package_name = Regex(r'[^|]+')  # | is a printable, so create a regex that excludes it.
package = Group(package_header + package_name) 
full_parser = delimitedList(package, delim="|" )

print full_parser.parseString("Package:numpy11 foo|Package:scipy")

印刷品:

[['Package:', 'numpy11 foo'], ['Package:', 'scipy']]

相关问题 更多 >