恢复嵌套的forloop

3条回答

网友

1楼 · 编辑于 2024-09-29 19:33:07

我不会担心的。在您的示例中，t是一个文件句柄，您正在对其进行迭代。Python中的文件句柄是它们自己的迭代器；它们具有关于在文件中读取位置的状态信息，并且在您对它们进行迭代时将保留它们的位置。您可以查看python文档中的file.next()以获取更多信息。你知道吗

另请参阅另一个同样讨论迭代器的SO答案：What does the "yield" keyword do in Python?。有很多有用的信息！你知道吗

编辑：这里有另一种使用字典组合它们的方法。如果要在输出之前对记录进行其他修改，则需要使用此方法：

import sys

def get_records(source_lines):
    records = {}
    current_id = None
    for line in source_lines:
        if line.startswith('~EOR~'):
            continue
        # Split the line up on the first space
        tag, val = [l.rstrip() for l in line.split(' ', 1)]
        if tag == 'ID':
            current_id = val
            records[current_id] = {}
        else:
            records[current_id][tag] = val
    return records

if __name__ == "__main__":
    with open(sys.argv[1]) as f:
        broken = get_records(f)
    with open(sys.argv[2]) as f:
        fixed = get_records(f)

    # Merge the broken and fixed records
    repaired = broken
    for id in fixed.keys():
        repaired[id] = dict(broken[id].items() + fixed[id].items())

    with open(sys.argv[3], 'w') as f:
        for id, tags in sorted(repaired.items()):
            f.write('ID {}\n'.format(id))
            for tag, val in sorted(tags.items()):
                f.write('{} {}\n'.format(tag, val))
            f.write('~EOR~\n')

dict(broken[id].items() + fixed[id].items())部分利用了这一点： How to merge two Python dictionaries in a single expression?

网友

2楼 · 编辑于 2024-09-29 19:33:07

为了完整起见，为了分享我的热情和我学到的东西，下面是我现在使用的代码。它回答了我的问题，还有更多。你知道吗

这部分是基于上述阿卡雷姆的方法。一个函数填充一个dict。它被调用两次，一次用于修复文件，一次用于要修复的文件。你知道吗

import codecs, collections
from GetInfiles import *

sourcefile, targetfile = GetInfiles('dat')
    # GetInfiles reads two input parameters from the command line,
    # verifies they exist as files with the right extension, 
    # and then returns their names. Code not included here. 

resultfile = targetfile[:-4] + '_result.dat'  

def recordlist(infile):
    record = collections.OrderedDict()
    reclist = []

    with codecs.open(infile, 'r', 'utf-8_sig') as f:
        for line in f:
            try:
                key, value = line.split(' ', 1)

            except:
                key = line 
                # so this line must be '~EOR~\n'. 
                # All other lines must have the shape 'tag: content\n'
                # so if this errors, there's something wrong with an input file

            if not key.startswith('~EOR~'):
                try: 
                    record[key].append(value)
                except KeyError:
                    record[key] = [value]

            else:
                reclist.append(record)
                record = collections.OrderedDict()

    return reclist

# put files into ordered dicts            
source = recordlist(sourcefile)
target = recordlist(targetfile)

# patching         
for fix in source:
    for record in target:
        if fix['ID'] == record['ID']:
            record.update(fix)

# write-out            
with codecs.open(resultfile, 'w', 'utf-8_sig') as f:
    for record in target:
        for tag, field in record.iteritems():
            for occ in field: 
                line = u'{} {}'.format(tag, occ)
                f.write(line)

        f.write('~EOR~\n')

它现在是一个有序的dict。这不在我的OP中，但是文件需要由人类交叉检查，所以保持顺序会更容易。（Using OrderedDict is really easy）。我第一次尝试找到这个功能时就想到了odict，但是它的文档让我很担心。没有例子，吓人的行话……）

而且，它现在支持记录中任意给定标记的多次出现。这也不在我的行动中，但我需要这个。（这种格式叫做‘Adlib taged’，是一种编目软件。）

与akaRem的方法不同的是修补，对目标dict使用update，我发现这和python一样非常优雅。对于startswith也是如此。这是我忍不住分享的另外两个原因。你知道吗

我希望它有用。你知道吗

网友

3楼 · 编辑于 2024-09-29 19:33:07

# building initial storage

content = {}
record = {}
order = []
current = None

with open('broken.file', 'r') as f:
    for line in f:
        items = line.split(' ', 1)
        try:
            key, value = items
        except:
            key, = items
            value = None

        if key == 'ID':
            current = value
            order.append(current)
            content[current] = record = {}
        elif key == '~EOR~':
            current = None
            record = {}
        else:
            record[key] = value

# patching

with open('patches.file', 'r') as f:
    for line in f:
        items = line.split(' ', 1)
        try:
            key, value = items
        except:
            key, = items
            value = None

        if key == 'ID':
            current = value
            record = content[current]  # updates existing records only!
            # if there is no such id -> raises

            # alternatively you may check and add them to the end of list
            # if current in content: 
            #     record = content[current]
            # else:
            #     order.append(current)
            #     content[current] = record = {}

        elif key == '~EOR~':
            current = None
            record = {}
        else:
            record[key] = value

# patched!
# write-out

with open('output.file', 'w') as f:
     for current in order:
         out.write('ID '+current+'\n')
         record = content[current]
         for key in sorted(record.keys()):
             out.write(key + ' ' + (record[key] or '') + '\n')  

# job's done

有问题吗？你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章