处理Python read_csv执行中的不良行

,"File Inputs",,,,,,,,,,,"Email Category",,"Contact Info Category", RecCtr,Attom_ID,PeopleID,"First Name","Last Name",AddressFullStreet,City,State,Zip," ","Individual Level Match"," ","Email Address"," ",Phone,"Phone Type" 1,19536969,80209511,ANTHONY1,MACCA1,"123 Main RD","Anytown",MA,12345 2,169874349,80707224,ANTHONY2,MACCA2,"123 Main RD","Anytown",MA,12345 3,1057347,81837554,ANTHONY3,MACCA3,"123 Main RD","Anytown",MA,12345 4,36946575,81869227,ANTHONY3,MACCA4,"123 Main RD","Anytown",MA,12345,,YES,,,,1234567890,Mobile

df = pd.read_csv(file, skiprows=2, dtype=str, header=None) df.columns = ['RecCtr', 'Attom_ID', 'PeopleID', 'First_Name', 'Last_Name', 'AddressFullStreet', 'City', 'State', 'Zip', 'blank1', 'Individual_Level_Match', 'blank2', 'Email_Address', 'blank3', 'Phone', 'Phone_Type' ] df = df.replace({pd.np.nan: None})

变化1：

headers = ['RecCtr', 'Attom_ID', 'PeopleID', 'First_Name', 'Last_Name', 'AddressFullStreet', 'City', 'State', 'Zip', 'blank1', 'Individual_Level_Match', 'blank2', 'Email_Address', 'blank3', 'Phone', 'Phone_Type'] df = pd.read_csv(file, skiprows=2, dtype=str, header=headers)

答复：

raise ValueError("header must be integer or list of integers") ValueError: header must be integer or list of integers

1条回答

网友

1楼 · 发布于 2024-09-27 09:27:36

不幸的是，如果行的值太少而太多（error_bad_lines=False），则不能跳过行。
通过使用header=None，它将第一个未跳过的行作为正确的列数，这意味着第四行不好（列太多）。你知道吗

您可以从文件中读取列名，也可以将列名传递给read_csv()，例如

df = pd.read_csv(file, skiprows=1, dtype=str, header=0)

或：

cols = ['RecCtr', 'Attom_ID', 'PeopleID', 'First_Name', 'Last_Name', ...]
df = pd.read_csv(file, skiprows=2, dtype=str, names=cols)

它修复了正确的列数，然后它将无错误地分析1-4行，并用NaN填充1-3中缺少的列

如果知道最后一列（或任何其他列）应该有值，则可以删除该列中带有NaN的行：

df.dropna(subset=['Phone Type'])

或：

df[df['Phone Type'].notnull()]

变化1：

相关问题更多 >

编程相关推荐

热门问题

热门文章