我正在尝试读取如下所示的CSV:
import pandas as pd
from io import StringIO
s = """
System Name,System ID,System Type,Flood Source,System Authorization,Rehabilitation Program Status,Responsible Organization
"Napa River, left bank above Tulocay Creek","5305000080","Channel","Napa River","USACE Federally constructed, turned over to public sponsor operations and maintenance","Active","USACE - San Francisco District"
"Napa River, right bank below Napa Creek","5305000050","Levee System","","USACE Federally constructed, turned over to public sponsor operations and maintenance","Active","USACE - San Francisco District"
"Needles "S" Street ","3805030008","Levee System",""S" Street Wash, Dead Mountain HA","USACE Federally constructed, turned over to public sponsor operations and maintenance","Inactive","USACE - Los Angeles District"
"Nevada County Levee 1","1905046000","Levee System","Donner Creek","Locally Constructed, Locally Operated and Maintained","Not Enrolled","California"
"Nevada Levee","7005000873","Levee System","","Other Federal Agency","Not Enrolled","Bureau of Reclamation"
"""
pd.read_csv(StringIO(s))
问题是"Needles "S" Street "
有多个引号,结果是ParseError
:
ParserError: Error tokenizing data. C error: Expected 7 fields in line 5, saw 8
我尝试了this approach,但所有试图编写自己的分隔符的尝试都以拥有一个单列数据帧而告终。想法
引号内的引号必须用双引号“”转义,此行转义错误
“S”必须在两处转义为“S”。第二位前面有一个引号,因此整个多行字符串必须用“”而不是“”引起来
输出:
如果您无法轻松修复数据,那么数据的快速修复方法是对数据中的所有合法引号进行编码,删除非法引号,然后重新引用数据
或者只执行步骤1并更改
pd.read_csv()
调用的引号和分隔符。这将保留非法引号相关问题 更多 >
编程相关推荐