在Pandas中使用read_csv处理不需要的换行问题的回答

在Pandas中使用read_csv处理不需要的换行

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

正确的方法是在创建时修复文件。如果这不可能，您可以预处理文件或使用包装器。在 下面是一个使用字节级包装器的解决方案，该包装器将行合并，直到获得正确数量的分隔符。我使用字节级包装器来利用io模块的类，并尽可能少地添加自己的代码：<code>RawIOBase</code>从底层字节文件对象读取行，并组合行以获得预期数量的分隔符（仅重写<code>readinto</code>和{<cd3>}） <pre><code>class csv_wrapper(io.RawIOBase): def __init__(self, base, delim): self.fd = base # underlying (byte) file object self.nfields = None self.delim = ord(delim) # code of the delimiter (passed as a character) self.numl = 0 # number of line for error processing self._getline() # load and process the header line def _nfields(self): # number of delimiters in current line return len([c for c in self.line if c == self.delim]) def _getline(self): while True: # loads a new line in the internal buffer self.line = next(self.fd) self.numl += 1 if self.nfields is None: # store number of delims if not known self.nfields = self._nfields() else: while self.nfields > self._nfields(): # optionaly combine lines self.line = self.line.rstrip() + next(self.fd) self.numl += 1 if self.nfields != self._nfields(): # too much here... print("Too much fields line {}".format(self.numl)) continue # ignore the offending line and proceed self.index = 0 # reset line pointers self.linesize = len(self.line) break def readinto(self, b): if len(b) == 0: return 0 if self.index == self.linesize: # if current buffer is exhausted try: # read a new one self._getline() except StopIteration: return 0 for i in range(len(b)): # store in passed bytearray if self.index == self.linesize: break b[i] = self.line[self.index] self.index += 1 return i def readable(self): return True </code></pre> 然后可以将代码更改为： ^{pr2}$

在Pandas中使用read_csv处理不需要的换行

1 个回答

相关Python问题