用pandas读取Excel XML.xls文件

--------------------------------------------------------------------------- XLRDError Traceback (most recent call last) <ipython-input-28-0da33766e9d2> in <module>() ----> 1 df = pd.read_excel("coalpublic2012.xlsx") /Users/jonathan/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, **kwds) 161 162 if not isinstance(io, ExcelFile): --> 163 io = ExcelFile(io, engine=engine) 164 165 return io._parse_excel( /Users/jonathan/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in __init__(self, io, **kwds) 204 self.book = xlrd.open_workbook(file_contents=data) 205 else: --> 206 self.book = xlrd.open_workbook(io) 207 elif engine == 'xlrd' and isinstance(io, xlrd.Book): 208 self.book = io /Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/__init__.pyc in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows) 433 formatting_info=formatting_info, 434 on_demand=on_demand, --> 435 ragged_rows=ragged_rows, 436 ) 437 return bk /Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows) 89 t1 = time.clock() 90 bk.load_time_stage_1 = t1 - t0 ---> 91 biff_version = bk.getbof(XL_WORKBOOK_GLOBALS) 92 if not biff_version: 93 raise XLRDError("Can't determine file's BIFF version") /Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in getbof(self, rqd_stream) 1228 bof_error('Expected BOF record; met end of file') 1229 if opcode not in bofcodes: -> 1230 bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8]) 1231 length = self.get2bytes() 1232 if length == MY_EOF: /Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in bof_error(msg) 1222 if DEBUG: print("reqd: 0x%04x" % rqd_stream, file=self.logfile) 1223 def bof_error(msg): -> 1224 raise XLRDError('Unsupported format, or corrupt file: ' + msg) 1225 savpos = self._position 1226 opcode = self.get2bytes() XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'

3条回答

网友

1楼 · 编辑于 2024-09-28 21:23:15

问题是，虽然2013年的数据是一个实际的Excel文件，但2012年的数据是一个XML文档，这在Python中似乎是不受支持的。我想说，你最好的办法是在Excel中打开它，并将副本保存为正确的Excel文件或CSV。

网友

2楼 · 编辑于 2024-09-28 21:23:15

您可以通过编程方式转换此Excel XML文件。要求：只有Python和熊猫。

import pandas as pd
from xml.sax import ContentHandler, parse

# Reference https://goo.gl/KaOBG3
class ExcelHandler(ContentHandler):
    def __init__(self):
        self.chars = [  ]
        self.cells = [  ]
        self.rows = [  ]
        self.tables = [  ]
    def characters(self, content):
        self.chars.append(content)
    def startElement(self, name, atts):
        if name=="Cell":
            self.chars = [  ]
        elif name=="Row":
            self.cells=[  ]
        elif name=="Table":
            self.rows = [  ]
    def endElement(self, name):
        if name=="Cell":
            self.cells.append(''.join(self.chars))
        elif name=="Row":
            self.rows.append(self.cells)
        elif name=="Table":
            self.tables.append(self.rows)

excelHandler = ExcelHandler()
parse('coalpublic2012.xls', excelHandler)
df1 = pd.DataFrame(excelHandler.tables[0][4:], columns=excelHandler.tables[0][3])

网友

3楼 · 编辑于 2024-09-28 21:23:15

您可以通过编程方式转换此Excel XML文件。要求：安装Windows、Office。

1.在记事本ExcelToCsv.vbs脚本中创建：

if WScript.Arguments.Count < 3 Then
    WScript.Echo "Please specify the source and the destination files. Usage: ExcelToCsv <xls/xlsx source file> <csv destination file> <worksheet number (starts at 1)>"
    Wscript.Quit
End If

csv_format = 6

Set objFSO = CreateObject("Scripting.FileSystemObject")

src_file = objFSO.GetAbsolutePathName(Wscript.Arguments.Item(0))
dest_file = objFSO.GetAbsolutePathName(WScript.Arguments.Item(1))
worksheet_number = CInt(WScript.Arguments.Item(2))

Dim oExcel
Set oExcel = CreateObject("Excel.Application")

Dim oBook
Set oBook = oExcel.Workbooks.Open(src_file)
oBook.Worksheets(worksheet_number).Activate

oBook.SaveAs dest_file, csv_format

oBook.Close False
oExcel.Quit

转换CSV中的Excel XML文件：

$ cscript ExcelToCsv.vbs coalpublic2012.xls coalpublic2012.csv 1

用pandas打开CSV文件

>>> df1 = pd.read_csv('coalpublic2012.csv', skiprows=3)

引用：Faster way to read Excel files to pandas dataframe

相关问题更多 >

编程相关推荐

热门问题

热门文章