<p>正确的方法是在创建时修复文件。如果这不可能,您可以预处理文件或使用包装器。在</p>
<p>下面是一个使用字节级包装器的解决方案,该包装器将行合并,直到获得正确数量的分隔符。我使用字节级包装器来利用io模块的类,并尽可能少地添加自己的代码:<code>RawIOBase</code>从底层字节文件对象读取行,并组合行以获得预期数量的分隔符(仅重写<code>readinto</code>和{<cd3>})</p>
<pre><code>class csv_wrapper(io.RawIOBase):
def __init__(self, base, delim):
self.fd = base # underlying (byte) file object
self.nfields = None
self.delim = ord(delim) # code of the delimiter (passed as a character)
self.numl = 0 # number of line for error processing
self._getline() # load and process the header line
def _nfields(self):
# number of delimiters in current line
return len([c for c in self.line if c == self.delim])
def _getline(self):
while True:
# loads a new line in the internal buffer
self.line = next(self.fd)
self.numl += 1
if self.nfields is None: # store number of delims if not known
self.nfields = self._nfields()
else:
while self.nfields > self._nfields(): # optionaly combine lines
self.line = self.line.rstrip() + next(self.fd)
self.numl += 1
if self.nfields != self._nfields(): # too much here...
print("Too much fields line {}".format(self.numl))
continue # ignore the offending line and proceed
self.index = 0 # reset line pointers
self.linesize = len(self.line)
break
def readinto(self, b):
if len(b) == 0: return 0
if self.index == self.linesize: # if current buffer is exhausted
try: # read a new one
self._getline()
except StopIteration:
return 0
for i in range(len(b)): # store in passed bytearray
if self.index == self.linesize: break
b[i] = self.line[self.index]
self.index += 1
return i
def readable(self):
return True
</code></pre>
<p>然后可以将代码更改为:</p>
^{pr2}$