<p>如果您更喜欢类似于解析器的东西,下面是针对您的问题的pyparsing工具:</p>
<pre><code>from pyparsing import Suppress,QuotedString,Word,alphas,nums,alphanums,Keyword,Optional
import datetime
# define UTC timezone for sake of eval
if hasattr(datetime,"timezone"):
UTC = datetime.timezone(datetime.timedelta(0),"UTC")
else:
UTC = None
_ = Suppress
evaltokens = lambda s,l,t: eval(''.join(t))
timevalue = 'datetime.datetime' + QuotedString('(', endQuoteChar=')', unquoteResults=False)
timevalue.setParseAction(evaltokens)
strvalue = 'u' + QuotedString("'", unquoteResults=False)
strvalue.setParseAction(evaltokens)
nonevalue = Keyword("None").setParseAction(lambda s,l,t: [None])
intvalue = Word(nums).setParseAction(lambda s,l,t: int(t[0]))
COMMA = Optional(_(","))
valuedexpr = lambda expr: (Word(alphas) + "(" + "value" + "=" + expr + ")").setParseAction(lambda t: t[4])
lineexpr = (_("Aggregate(aggregate_dimension_value_list=[") +
valuedexpr(timevalue)("timestamp") + COMMA +
(nonevalue | valuedexpr(strvalue))("s1") + COMMA +
(nonevalue | valuedexpr(strvalue))("s2") + COMMA +
"]" + COMMA +
"quantity=" + intvalue("qty"))
</code></pre>
<p>使用<code>lineexpr.searchString</code>从每个聚合中提取数据:</p>
^{pr2}$
<p>给予:</p>
<pre><code>[datetime.datetime(2013, 8, 28, 19, 30), None, u'VIRTUALLY_LABELED_CASE', ']', 'quantity=', 127]
- qty: 127
- s1: None
- s2: VIRTUALLY_LABELED_CASE
- timestamp: 2013-08-28 19:30:00
127
[datetime.datetime(2013, 8, 28, 19, 30), u'PPTransMergeNonCon', u'PRIME_BIN_RANDOM_STOW', ']', 'quantity=', 15]
- qty: 15
- s1: PPTransMergeNonCon
- s2: PRIME_BIN_RANDOM_STOW
- timestamp: 2013-08-28 19:30:00
15
[datetime.datetime(2013, 8, 27, 21, 0), u'PPTransFRA1', u'PRIME_BIN_RANDOM_STOW', ']', 'quantity=', 8]
- qty: 8
- s1: PPTransFRA1
- s2: PRIME_BIN_RANDOM_STOW
- timestamp: 2013-08-27 21:00:00
8
</code></pre>
<p><code>dump()</code>将显示所有可用的命名结果值-请注意如何使用<code>data.qty</code>直接访问quantity属性。这是用<code>"quantity=" + intvalue("qty")</code>中的结果名“qty”的定义为您设置的。<code>timestamp</code>、<code>s1</code>和{<cd7>}可以类似地访问。(这里还有一点<code>eval</code>,清理这些内容是留给读者的练习。)</p>
<p>编辑:</p>
<p>这里是修改后的pyparsing解析器,用于处理原始的XML类内容。变化真的很小:</p>
^{4}$
<p>从您粘贴的文本(其中一些文本格式错误)中可以看到:</p>
<pre><code>['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 26, 20, 30), u'PPTransCGN1', u'PRIME_BIN_RANDOM_STOW', 'quantity=', 992]
- qty: 992
- s1: PPTransCGN1
- s2: PRIME_BIN_RANDOM_STOW
- timestamp: 2013-08-26 20:30:00
992
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 23, 19, 30), None, u'TOTE', 'quantity=', 87]
- qty: 87
- s1: None
- s2: TOTE
- timestamp: 2013-08-23 19:30:00
87
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 27, 17, 30), u'PPTransMUC3', u'TOTE', 'quantity=', 14]
- qty: 14
- s1: PPTransMUC3
- s2: TOTE
- timestamp: 2013-08-27 17:30:00
14
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 27, 20, 30), u'PPTransEUK5', u'PRIME_BIN_RANDOM_STOW', 'quantity=', 339]
- qty: 339
- s1: PPTransEUK5
- s2: PRIME_BIN_RANDOM_STOW
- timestamp: 2013-08-27 20:30:00
339
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 26, 20, 30), u'PPTransCGN1', u'TOTE', 'quantity=', 1731]
- qty: 1731
- s1: PPTransCGN1
- s2: TOTE
- timestamp: 2013-08-26 20:30:00
1731
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 26, 19, 30), u'PPTransEUK5', u'TOTE', 'quantity=', 28]
- qty: 28
- s1: PPTransEUK5
- s2: TOTE
- timestamp: 2013-08-26 19:30:00
28
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 28, 19, 30), u'PPTransORY1', u'PRIME_BIN_RANDOM_STOW', 'quantity=', 69]
- qty: 69
- s1: PPTransORY1
- s2: PRIME_BIN_RANDOM_STOW
- timestamp: 2013-08-28 19:30:00
69
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 26, 19, 30), u'PPTransMAD4', u'PRIME_BIN_RANDOM_STOW', 'quantity=', 47]
- qty: 47
- s1: PPTransMAD4
- s2: PRIME_BIN_RANDOM_STOW
- timestamp: 2013-08-26 19:30:00
47
['Aggregate', 'aggregate_dimension_value_list', '=', datetime.datetime(2013, 8, 26, 21, 0), None, None, 'quantity=', 78]
- qty: 78
- s1: None
- s2: None
- timestamp: 2013-08-26 21:00:00
78
</code></pre>