拆分结构化输入文件时的生成器“yield”产生不同步结果

#!/usr/bin/python2.7 import re def files(): n = 0 while n<12 : n += 1 print "**DEBUG** in generator nameFile=%s n=%d \r" % (nameFile, n) yield open('/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackDomains/%s.part.xml' % nameFile, 'w') filename='/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackspaceListDomain.output.xml' nameFile='' pat ='<?xml' namePat=re.compile('<ns2:domain.+ name="(.+?)".+>') fs = files() outfile = next(fs) with open(filename) as infile: for line in infile: m=namePat.search(line) if m: nameFile=m.group(1) print "<---\rin 'if m:' nameFile=%s\r" % (nameFile) if pat not in line: # print "\rin 'pat not in line' line=%s\r" % (line) outfile.write(line) else: items = line.split(pat) outfile.write(items[0]) for item in items[1:]: print "in 'for item' pre next(fs) nameFile=%s\r" % (nameFile) outfile = next(fs) print "in 'for item' post next(fs) nameFile=%s --->\r" % (nameFile) outfile.write(pat + item)

**DEBUG** in generator nameFile= n=1 in 'for item' pre next(fs) nameFile= **DEBUG** in generator nameFile= n=2 in 'for item' post next(fs) nameFile= ---> <--- in 'if m:' nameFile=addressing.com in 'for item' pre next(fs) nameFile=addressing.com **DEBUG** in generator nameFile=addressing.com n=3 in 'for item' post next(fs) nameFile=addressing.com ---> <--- in 'if m:' nameFile=alicemcmahon.com in 'for item' pre next(fs) nameFile=alicemcmahon.com **DEBUG** in generator nameFile=alicemcmahon.com n=4 in 'for item' post next(fs) nameFile=alicemcmahon.com ---> <--- in 'if m:' nameFile=alphabets.com in 'for item' pre next(fs) nameFile=alphabets.com **DEBUG** in generator nameFile=alphabets.com n=5 in 'for item' post next(fs) nameFile=alphabets.com --->

.part.xml (this has data from 'addressing.com') addressing.com.part.xml alicemcmahon.com.part.xml alphabets.com.part.xml americanletterpress.com.part.xml americanwoodtype.com.part.xml amyshoemaker.com.part.xml archaicrevivalbooks.com.part.xml archaicrevivalfonts.com.part.xml archaicrevivalimages.com.part.xml astroteddies.com.part.xml

<?xml version='1.0' encoding='utf-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204245" name="addressing.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:36Z" created="2009-07-25T15:05:39Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="5"> <ns2:record id="A-2542579" type="A" name="addressing.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:35Z" created="2010-02-17T05:02:16Z" /> </ns2:recordsList> </ns2:domain> <?xml version='1.0' encoding='UTF-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="2776403" name="alicemcmahon.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2013-10-21T16:43:17Z" created="2011-05-01T03:01:51Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="10"> <ns2:record id="A-6895108" type="A" name="alicemcmahon.com" data="216.185.152.144" ttl="300" updated="2013-10-21T16:43:17Z" created="2011-05-01T03:01:51Z" /> </ns2:recordsList> </ns2:domain> <?xml version='1.0' encoding='UTF-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204247" name="americanletterpress.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:37Z" created="2009-07-25T15:05:41Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="5"> <ns2:record id="A-2542581" type="A" name="americanletterpress.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:36Z" created="2010-02-17T05:02:16Z" /> </ns2:recordsList> </ns2:domain> <?xml version='1.0' encoding='UTF-8'?> <ns2:domain xmlns:ns3="http://www.w3.org/2005/Atom" xmlns:ns2="http://docs.rackspacecloud.com/dns/api/v1.0" xmlns="http://docs.rackspacecloud.com/dns/api/management/v1.0" id="1204249" name="americanwoodtype.com" ttl="300" emailAddress="ipadmin@stabletransit.com" updated="2012-10-10T21:33:38Z" created="2009-07-25T15:05:42Z"> <ns2:nameservers> <ns2:nameserver name="dns1.stabletransit.com" /> <ns2:nameserver name="dns2.stabletransit.com" /> </ns2:nameservers> <ns2:recordsList totalEntries="5"> <ns2:record id="A-2542583" type="A" name="americanwoodtype.com" data="198.101.155.141" ttl="300" updated="2012-10-10T21:33:37Z" created="2010-02-17T05:02:16Z" /> </ns2:recordsList> </ns2:domain>

1条回答

网友

1楼 · 发布于 2024-09-27 04:21:49

您要求生成器从一开始就生成一个输出文件：

nameFile=''
# ...
outfile = next(fs)

那是你的空白文件名。推迟调用next(fs)，直到您有了nameFile的值，而不是之前。你知道吗

您可以改为设置outfile = None，并在写入之前测试None：

if pat not in line:
    if outfile is not None: 
        outfile.write(line)
else:
    items = line.split(pat)
    if outfile is not None:
        outfile.write(items[0])

如果在找到第一个文件名之前需要处理行，请将这些行存储在缓冲区中，并在第一次创建新文件时清除缓冲区。你知道吗

我并不认为你应该使用一个生成器，你使用它真的太复杂了。只需直接在循环中创建新的文件对象，这就更清楚了。你知道吗

如果您所做的只是分割文件，请使用缓冲区，直到您有一个文件名：

buffer = []
out_name = '/Users/peterf/Google Drive/2015 Projects-Strategy/Domain Admin/RackDomains/%s.part.xml'

outfile = None

with open(filename) as infile:
    for line in infile:
        # look for a filename to write to if we don't have one yet
        if outfile is None:
            match = namePat.search(line)
            if match:
                # New filename, open a file object
                outfile = open(out_name % match.group(1), 'w')
                # clear out the buffer, we'll write directly to 
                # the file after this.
                outfile.writelines(buffer)
                buffer = []

        if '<?xml' in line:
            # new XML doc, close off the previous one
            if outfile is not None:
                outfile.close()
            outfile = None

        # line handling
        if outfile is None:
            buffer.append(line)
        else:
            outfile.write(line)

if outfile is not None:
    outfile.close()
# All lines processed, if there is a buffer left, then we have unhandled lines
if buffer:
    print('There were trailing lines without a name')
    print(*buffer, sep='')

相关问题更多 >

编程相关推荐

热门问题

热门文章