我试图使用python中的tokenize
库来标记python代码。对于示例输入:-
def cal_cone_curved_surf_area(slant_height,radius):\n\tpi=3.14\n\treturn pi*radius*slant_height\n\n
我使用以下代码获取所有令牌(这里p
是示例输入字符串):
text = tokenize.generate_tokens(io.StringIO(p).readline)
[tok for tok in text]
运行代码段后,我得到以下输出:
[TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=1 (NAME), string='cal_cone_curved_surf_area', start=(1, 4), end=(1, 29), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=53 (OP), string='(', start=(1, 29), end=(1, 30), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=1 (NAME), string='slant_height', start=(1, 30), end=(1, 42), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=53 (OP), string=',', start=(1, 42), end=(1, 43), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=1 (NAME), string='radius', start=(1, 43), end=(1, 49), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=53 (OP), string=')', start=(1, 49), end=(1, 50), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=53 (OP), string=':', start=(1, 50), end=(1, 51), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 51), end=(1, 52), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=5 (INDENT), string='\t', start=(2, 0), end=(2, 1), line='\tpi=3.14\n'),
TokenInfo(type=1 (NAME), string='pi', start=(2, 1), end=(2, 3), line='\tpi=3.14\n'),
TokenInfo(type=53 (OP), string='=', start=(2, 3), end=(2, 4), line='\tpi=3.14\n'),
TokenInfo(type=2 (NUMBER), string='3.14', start=(2, 4), end=(2, 8), line='\tpi=3.14\n'),
TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 8), end=(2, 9), line='\tpi=3.14\n'),
TokenInfo(type=1 (NAME), string='return', start=(3, 1), end=(3, 7), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=1 (NAME), string='pi', start=(3, 8), end=(3, 10), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=53 (OP), string='*', start=(3, 10), end=(3, 11), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=1 (NAME), string='radius', start=(3, 11), end=(3, 17), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=53 (OP), string='*', start=(3, 17), end=(3, 18), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=1 (NAME), string='slant_height', start=(3, 18), end=(3, 30), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 30), end=(3, 31), line='\treturn pi*radius*slant_height\n'),
TokenInfo(type=56 (NL), string='\n', start=(4, 0), end=(4, 1), line='\n'),
TokenInfo(type=6 (DEDENT), string='', start=(5, 0), end=(5, 0), line=''),
TokenInfo(type=0 (ENDMARKER), string='', start=(5, 0), end=(5, 0), line='')]
可以看出,我只能提取一个INDENT
令牌(行号10),但不能提取第二个NEWLINE
之后的第二个。如何确保在源代码中获得所有正确的INDENT
标记
标记缩进是在输入块时生成的,而不是针对每一行。退出块时,
generate_tokens()
生成令牌DEDENT。从缩进到下一个缩进或匹配的DEDENT的所有标记都具有相同的缩进级别相关问题 更多 >
编程相关推荐