使用python的tokenize提取所有'INDENT'标记

2024-10-17 06:29:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用python中的tokenize库来标记python代码。对于示例输入:-

def cal_cone_curved_surf_area(slant_height,radius):\n\tpi=3.14\n\treturn pi*radius*slant_height\n\n

我使用以下代码获取所有令牌(这里p是示例输入字符串):

text = tokenize.generate_tokens(io.StringIO(p).readline)
[tok for tok in text]

运行代码段后,我得到以下输出:

[TokenInfo(type=1 (NAME), string='def', start=(1, 0), end=(1, 3), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=1 (NAME), string='cal_cone_curved_surf_area', start=(1, 4), end=(1, 29), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
TokenInfo(type=53 (OP), string='(', start=(1, 29), end=(1, 30), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=1 (NAME), string='slant_height', start=(1, 30), end=(1, 42), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=53 (OP), string=',', start=(1, 42), end=(1, 43), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=1 (NAME), string='radius', start=(1, 43), end=(1, 49), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=53 (OP), string=')', start=(1, 49), end=(1, 50), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=53 (OP), string=':', start=(1, 50), end=(1, 51), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(1, 51), end=(1, 52), line='def cal_cone_curved_surf_area(slant_height,radius):\n'),
 TokenInfo(type=5 (INDENT), string='\t', start=(2, 0), end=(2, 1), line='\tpi=3.14\n'),
 TokenInfo(type=1 (NAME), string='pi', start=(2, 1), end=(2, 3), line='\tpi=3.14\n'),
 TokenInfo(type=53 (OP), string='=', start=(2, 3), end=(2, 4), line='\tpi=3.14\n'),
 TokenInfo(type=2 (NUMBER), string='3.14', start=(2, 4), end=(2, 8), line='\tpi=3.14\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(2, 8), end=(2, 9), line='\tpi=3.14\n'),
 TokenInfo(type=1 (NAME), string='return', start=(3, 1), end=(3, 7), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=1 (NAME), string='pi', start=(3, 8), end=(3, 10), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=53 (OP), string='*', start=(3, 10), end=(3, 11), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=1 (NAME), string='radius', start=(3, 11), end=(3, 17), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=53 (OP), string='*', start=(3, 17), end=(3, 18), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=1 (NAME), string='slant_height', start=(3, 18), end=(3, 30), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=4 (NEWLINE), string='\n', start=(3, 30), end=(3, 31), line='\treturn pi*radius*slant_height\n'),
 TokenInfo(type=56 (NL), string='\n', start=(4, 0), end=(4, 1), line='\n'),
  TokenInfo(type=6 (DEDENT), string='', start=(5, 0), end=(5, 0), line=''),
  TokenInfo(type=0 (ENDMARKER), string='', start=(5, 0), end=(5, 0), line='')]

可以看出,我只能提取一个INDENT令牌(行号10),但不能提取第二个NEWLINE之后的第二个。如何确保在源代码中获得所有正确的INDENT标记


Tags: stringdeftypelineareastartsurfcal
1条回答
网友
1楼 · 发布于 2024-10-17 06:29:45

标记缩进是在输入块时生成的,而不是针对每一行。退出块时,generate_tokens()生成令牌DEDENT。从缩进到下一个缩进或匹配的DEDENT的所有标记都具有相同的缩进级别

相关问题 更多 >