恢复嵌套的forloop问题的回答

恢复嵌套的forloop

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

两个文件。一个有损坏的数据，另一个有修复。破损： <pre><code>ID 0 T5 rat cake ~EOR~ ID 1 T1 wrong segg T2 wrong nacob T4 rat tart ~EOR~ ID 3 T5 rat pudding ~EOR~ ID 4 T1 wrong sausag T2 wrong mspa T3 strawberry tart ~EOR~ ID 6 T5 with some rat in it ~EOR~ </code></pre> 修复： <pre><code>ID 1 T1 eggs T2 bacon ~EOR~ ID 4 T1 sausage T2 spam T4 bereft of loif ~EOR~ </code></pre> EOR表示记录结束。请注意，断开的文件比修复文件有更多的记录，修复文件有要修复的标记（T1、T2等是标记）和要添加的标记。这段代码正是它应该做的： <pre><code># foobar.py import codecs source = 'foo.dat' target = 'bar.dat' result = 'result.dat' with codecs.open(source, 'r', 'utf-8_sig') as s, \ codecs.open(target, 'r', 'utf-8_sig') as t, \ codecs.open(result, 'w', 'utf-8_sig') as u: sID = ST1 = sT2 = sT4 = '' RecordFound = False # get source data, record by record for sline in s: if sline.startswith('ID '): sID = sline if sline.startswith('T1 '): sT1 = sline if sline.startswith('T2 '): sT2 = sline if sline.startswith('T4 '): sT4 = sline if sline.startswith('~EOR~'): for tline in t: # copy target file lines, replacing when necesary if tline == sID: RecordFound = True if tline.startswith('T1 ') and RecordFound: tline = sT1 if tline.startswith('T2 ') and RecordFound: tline = sT2 if tline.startswith('~EOR~') and RecordFound: if sT4: tline = sT4 + tline RecordFound = False u.write(tline) break u.write(tline) for tline in t: u.write(tline) </code></pre> 我正在写一个新文件，因为我不想把另外两个搞砸。第一个外部for循环在fixes文件的最后一条记录上结束。此时，仍有记录要写入目标文件。最后一个for子句就是这样做的。你知道吗 让我烦恼的是，最后一行隐式地拾取了第一个内部for循环最后一次中断的位置。就好像它应该说“为了这条线的其余部分”。另一方面，我不明白如何用更少（或不是更多）的代码行（使用dicts和你所拥有的东西）来实现这一点。我应该担心吗？你知道吗 请评论。你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

为了完整起见，为了分享我的热情和我学到的东西，下面是我现在使用的代码。它回答了我的问题，还有更多。你知道吗 这部分是基于上述阿卡雷姆的方法。一个函数填充一个dict。它被调用两次，一次用于修复文件，一次用于要修复的文件。你知道吗 <pre><code>import codecs, collections from GetInfiles import * sourcefile, targetfile = GetInfiles('dat') # GetInfiles reads two input parameters from the command line, # verifies they exist as files with the right extension, # and then returns their names. Code not included here. resultfile = targetfile[:-4] + '_result.dat' def recordlist(infile): record = collections.OrderedDict() reclist = [] with codecs.open(infile, 'r', 'utf-8_sig') as f: for line in f: try: key, value = line.split(' ', 1) except: key = line # so this line must be '~EOR~\n'. # All other lines must have the shape 'tag: content\n' # so if this errors, there's something wrong with an input file if not key.startswith('~EOR~'): try: record[key].append(value) except KeyError: record[key] = [value] else: reclist.append(record) record = collections.OrderedDict() return reclist # put files into ordered dicts source = recordlist(sourcefile) target = recordlist(targetfile) # patching for fix in source: for record in target: if fix['ID'] == record['ID']: record.update(fix) # write-out with codecs.open(resultfile, 'w', 'utf-8_sig') as f: for record in target: for tag, field in record.iteritems(): for occ in field: line = u'{} {}'.format(tag, occ) f.write(line) f.write('~EOR~\n') </code></pre> 它现在是一个有序的dict。这不在我的OP中，但是文件需要由人类交叉检查，所以保持顺序会更容易。（<a href="http://pymotw.com/2/collections/ordereddict.html" rel="nofollow">Using OrderedDict is really easy</a>）。我第一次尝试找到这个功能时就想到了odict，但是它的文档让我很担心。没有例子，吓人的行话……） 而且，它现在支持记录中任意给定标记的多次出现。这也不在我的行动中，但我需要这个。（这种格式叫做‘Adlib taged’，是一种编目软件。） 与akaRem的方法不同的是修补，对目标dict使用<code>update</code>，我发现这和python一样非常优雅。对于<code>startswith</code>也是如此。这是我忍不住分享的另外两个原因。你知道吗 我希望它有用。你知道吗

恢复嵌套的forloop

1 个回答

相关Python问题