跟踪序列中的开/关状态

def tokspan(starttok, endtok, stream): inside = False for tok in stream: if (not inside) and tok == starttok: inside = True yield (inside, tok) if inside and tok == endtok: inside = False tstream = "int x; /* a non-nesting comment /* etc. */ x=1; main();".split() for status, tok in tokspan("/*", "*/", tstream): print(status, tok)

2条回答

网友

1楼 · 编辑于 2024-09-30 08:22:42

我能想到的唯一简化是重写设置/重置inside的逻辑：

def tokspan(starttok, endtok, stream):
    inside = False
    for tok in stream:
        inside |= (tok == starttok)
        yield (inside, tok)
        inside &= (tok != endtok)

这是否使代码可读性增加或减少，是旁观者的眼中钉。你知道吗

网友

2楼 · 编辑于 2024-09-30 08:22:42

也许可以在这里用一个装饰工。我不确定这是否会对你有用，但这可能只是给你一些想法。你知道吗

创建一个decorator，用于存储要过滤的项：

import itertools as it

class insideDec(object):

    def __init__(self, start, stop):

        self.start = start
        self.stop  = stop

    def __call__(self, f):

        def wrapper(x):
            x1 = it.dropwhile(lambda m: not m.startswith(self.start), x  )
            x1.next()
            x2 = it.takewhile(lambda m: not m.startswith(self.stop),  x1 )
            return f(x2)

        return wrapper 

@insideDec('{', '}')
def f(val):
    return val

if __name__ == '__main__':
    print ''.join(f('This is some {string that needs to} be printed'))

现在将decorator应用于接受字符串的函数。这将把函数转换成一个以迭代器为输入的函数。然后像处理其他迭代器一样处理迭代器。你知道吗

当然，您可以随时将迭代器转换为字符串（例如这里）：

        # rest of the code ...
        x2 = it.takewhile(lambda m: not m.startswith(self.stop),  x1 )
        return f(''.join(x2))
        # rest of the code ...

这真的取决于你。。。你知道吗

编辑：

很抱歉。我误解了你的问题。对于标记化，下面的内容可能会有所帮助？你知道吗

class tokenize():

    def __init__(self, strVal, start, stop):
        self.start   = start
        self.stop    = stop
        self.strTees = it.tee(strVal, len(start))
        self.inside  = False
        for i, strTee in enumerate(self.strTees):
            for j in range(i):
                next(strTee, '')
        self.strVals = it.izip( *self.strTees )

    def __iter__(self):
        return self

    def next(self):

        v = ''.join(self.strVals.next())
        if v == '': raise StopIteration
        if v == self.start: self.inside = True
        if v == self.stop:  self.inside = False

        # print '[',v, ']'

        return (v[0], self.inside)


if __name__ == '__main__':

    strVal = 'int x; /* a non-nesting comment etc. */ x=1; main();'
    for x, y in tokenize( strVal, '/*', '*/' ):
        print x, y

再说一次，这不是完美的，也许可以达到你的目的。。。你知道吗

以下是输出：我错了 n错误不是假的假 x错误；错误假 /是的 *是的是的真实的是的 n正确哦，是的 n正确 -是的 n正确是的这是真的不是真的我是真的 n正确 g正确是的 c正确哦，是的我是真的我是真的是的 n正确不是真的是的是的不是真的 c正确 . 是的是的 *假 /假假 x错误 =错误 1个错误；错误假我错了假的我错了 n错误（错误） )假

相关问题更多 >

编程相关推荐

热门问题

热门文章