python标记化中的Bug?

2024-09-30 05:26:10 发布

您现在位置:Python中文网/ 问答频道 /正文

为什么会这样

if 1 \
and 0:
    pass

最简单的代码阻塞在标记化/非标记化循环上

^{pr2}$

它抛出:

AssertionError:
File "/mnt/home/anushri/untitled-1.py", line 13, in <module>
  print tok_untok(src)
File "/mnt/home/anushri/untitled-1.py", line 6, in tok_untok
  tokenize.untokenize(tokenize.generate_tokens(f.readline))
File "/usr/lib/python2.6/tokenize.py", line 262, in untokenize
  return ut.untokenize(iterable)
File "/usr/lib/python2.6/tokenize.py", line 198, in untokenize
  self.add_whitespace(start)
File "/usr/lib/python2.6/tokenize.py", line 187, in add_whitespace
  assert row <= self.prev_row

是否有一个不修改src以标记化的解决方法(似乎\是罪魁祸首)

另一个失败的例子是如果结尾没有换行,例如src='if 1:pass'失败,并出现相同的错误

解决方法: 但它似乎使用了不同的方法

def tok_untok(src):
    f = cStringIO.StringIO(src)
    tokens = [ t[:2] for t in tokenize.generate_tokens(f.readline)]
    return tokenize.untokenize(tokens)

即不传回整个令牌元组,而只传回t[:2]

尽管python doc表示跳过了额外的参数

Converts tokens back into Python source code. The iterable must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored.


Tags: 方法inpy标记srcreturnlibusr

热门问题