在调用s后使用“enumerate”获得的iterable中的decrease计数器

2024-06-23 19:42:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在使用Python读取一个文件,文件中有用“#”字符括起来的部分:

#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233 
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION

现在我编写了如下代码来读取文件:

with open(filename) as fh:

    enumerated = enumerate(iter(fh.readline, ''), start=1)

    for lino, line in enumerated:

        # handle special section
        if line.startswith('#'):

            print("="*40)
            print(line)

            while True:

                start = fh.tell()
                lino, line = next(enumerated)

                if line.startswith('#'):
                    fh.seek(start)
                    break

                print("[{}] {}".format(lino,line))

输出为:

========================================
#HEADER1, SOME EXTRA INFO

[2] data first section

[3] 1 2

[4] 1 233 

[5] ...

[6] // THIS IS A COMMENT

========================================
#HEADER2, SECOND SECTION

[9] 452

[10] 134

[11] // ANOTHER COMMENT

[12] ...

========================================
#HEADER3, THIRD SECTION

现在您看到行计数器lino不再有效,因为我使用的是seek。而且,在中断循环之前减少它也无济于事,因为每次对next的调用都会增加这个计数器。那么在Python3.x中有没有一种优雅的方法来解决这个问题呢?另外,有没有更好的方法来解决StopIteration问题,而无需在Except块中放入pass语句?你知道吗

更新

到目前为止,我已经采纳了一个基于@Dunes建议的实现。我不得不把它改了一点,这样我就可以向前看,看看是否有一个新的部分开始了。我不知道有没有更好的方法,所以请发表评论:

类枚举文件:

    def __init__(self, fh, lineno_start=1):
        self.fh = fh
        self.lineno = lineno_start

    def __iter__(self):
        return self

    def __next__(self):
        result = self.lineno, self.fh.readline()
        if result[1] == '':
            raise StopIteration

        self.lineno += 1
        return result

    def mark(self):
        self.marked_lineno = self.lineno
        self.marked_file_position = self.fh.tell()

    def recall(self):
        self.lineno = self.marked_lineno
        self.fh.seek(self.marked_file_position)

    def section(self):
        pos = self.fh.tell()
        char = self.fh.read(1)
        self.fh.seek(pos)
        return char != '#'

然后读取文件并按如下方式处理每个部分:

# create enumerated object
e = EnumeratedFile(fh)

header = ""
for lineno, line, in e:

    print("[{}] {}".format(lineno, line))

    header = line.rstrip()

    # HEADER1
    if header.startswith("#HEADER1"):

        # process header 1 lines
        while e.section():

            # get node line
            lineno, line = next(e)
            # do whatever needs to be done with the line

     elif header.startswith("#HEADER2"):

         # etc.

Tags: 文件selfifdeflinecommentsectionstart
2条回答

不能更改enumerate()iterable的计数器,否

你根本不需要在这里,也不需要寻找。而是使用嵌套循环并缓冲节标题:

with open(filename) as fh:
    enumerated = enumerate(fh, start=1)
    header = None
    for lineno, line in enumerated:
        # seek to first section
        if header is None:
            if not line.startswith('#'):
                continue
            header = line

        print("=" * 40)
        print(header.rstrip())
        for lineno, line in enumerated:
            if line.startswith('#'):
                # new section
                header = line
                break

            # section line, handle as such
            print("[{}] {}".format(lineno, line.rstrip()))

这只缓冲标题行;每当我们遇到一个新的标题时,它就会被存储起来,当前的节循环就结束了。你知道吗

演示:

>>> from io import StringIO
>>> demo = StringIO('''\
... #HEADER1, SOME EXTRA INFO
... data first section
... 1 2
... 1 233 
... ...
... // THIS IS A COMMENT
... #HEADER2, SECOND SECTION
... 452
... 134
... // ANOTHER COMMENT
... ...
... #HEADER3, THIRD SECTION
... ''')
>>> enumerated = enumerate(demo, start=1)
>>> header = None
>>> for lineno, line in enumerated:
...     # seek to first section
...     if header is None:
...         if not line.startswith('#'):
...             continue
...         header = line
...     print("=" * 40)
...     print(header.rstrip())
...     for lineno, line in enumerated:
...         if line.startswith('#'):
...             # new section
...             header = line
...             break
...         # section line, handle as such
...         print("[{}] {}".format(lineno, line.rstrip()))
... 
========================================
#HEADER1, SOME EXTRA INFO
[2] data first section
[3] 1 2
[4] 1 233
[5] ...
[6] // THIS IS A COMMENT
========================================
#HEADER2, SECOND SECTION
[9] 134
[10] // ANOTHER COMMENT
[11] ...
>>> header
'#HEADER3, THIRD SECTION\n'

第三部分保持未处理状态,因为其中没有行,但是如果有行,header变量已经在预期中设置好了。你知道吗

您可以复制迭代器,然后从该副本还原迭代器。但是,不能复制文件对象。您可以获取枚举数的浅层副本,然后在开始使用复制的枚举数时查找文件的相应部分。你知道吗

但是,最好的方法是编写生成器类,用__next__方法生成行号和行,用markrecall方法记录并返回到以前记录的状态。你知道吗

class EnumeratedFile:

    def __init__(self, fh, lineno_start=1):
        self.fh = fh
        self.lineno = lineno_start

    def __iter__(self):
        return self

    def __next__(self):
        result = self.lineno, next(self.fh)
        self.lineno += 1
        return result

    def mark(self):
        self.marked_lineno = self.lineno
        self.marked_file_position = self.fh.tell()

    def recall(self):
        self.lineno = self.marked_lineno
        self.fh.seek(self.marked_file_position)

你可以这样使用它:

from io import StringIO
demo = StringIO('''\
#HEADER1, SOME EXTRA INFO
data first section
1 2
1 233 
...
// THIS IS A COMMENT
#HEADER2, SECOND SECTION
452
134
// ANOTHER COMMENT
...
#HEADER3, THIRD SECTION
''')

e = EnumeratedFile(demo)
seen_header2 = False
for lineno, line, in e:
    if seen_header2:
        print(lineno, line)
        assert (lineno, line) == (2, "data first section\n")
        break
    elif line.startswith("#HEADER1"):
        e.mark()
    elif line.startswith("#HEADER2"):
        e.recall()
        seen_header2 = True

相关问题 更多 >

    热门问题