回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我知道之前问过很多问题,但是我在下面提供的可复制示例中没有给出任何解决方案。</p>
<p>我试图从<a href="http://www.eia.gov/coal/data.cfm#production" rel="noreferrer">http://www.eia.gov/coal/data.cfm#production</a>中读入<code>.xls</code>文件——特别是可以通过下拉列表免费获得的历史详细煤炭生产数据(1983-2013年)</strong><code>coalpublic2012.xls</code>文件。熊猫看不懂。</p>
<p>相比之下,最近一年(2013年)可用的文件<code>coalpublic2013.xls</code>可以正常工作:</p>
<pre><code>import pandas as pd
df1 = pd.read_excel("coalpublic2013.xls")
</code></pre>
<p>但未来十年(2004-2012年)的<code>.xls</code>文件不会加载。我用Excel查看过这些文件,它们是打开的,没有被破坏。</p>
<p>我从熊猫那里得到的错误是:</p>
<pre><code>---------------------------------------------------------------------------
XLRDError Traceback (most recent call last)
<ipython-input-28-0da33766e9d2> in <module>()
----> 1 df = pd.read_excel("coalpublic2012.xlsx")
/Users/jonathan/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in read_excel(io, sheetname, header, skiprows, skip_footer, index_col, parse_cols, parse_dates, date_parser, na_values, thousands, convert_float, has_index_names, converters, engine, **kwds)
161
162 if not isinstance(io, ExcelFile):
--> 163 io = ExcelFile(io, engine=engine)
164
165 return io._parse_excel(
/Users/jonathan/anaconda/lib/python2.7/site-packages/pandas/io/excel.pyc in __init__(self, io, **kwds)
204 self.book = xlrd.open_workbook(file_contents=data)
205 else:
--> 206 self.book = xlrd.open_workbook(io)
207 elif engine == 'xlrd' and isinstance(io, xlrd.Book):
208 self.book = io
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/__init__.pyc in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
433 formatting_info=formatting_info,
434 on_demand=on_demand,
--> 435 ragged_rows=ragged_rows,
436 )
437 return bk
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in open_workbook_xls(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
89 t1 = time.clock()
90 bk.load_time_stage_1 = t1 - t0
---> 91 biff_version = bk.getbof(XL_WORKBOOK_GLOBALS)
92 if not biff_version:
93 raise XLRDError("Can't determine file's BIFF version")
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in getbof(self, rqd_stream)
1228 bof_error('Expected BOF record; met end of file')
1229 if opcode not in bofcodes:
-> 1230 bof_error('Expected BOF record; found %r' % self.mem[savpos:savpos+8])
1231 length = self.get2bytes()
1232 if length == MY_EOF:
/Users/jonathan/anaconda/lib/python2.7/site-packages/xlrd/book.pyc in bof_error(msg)
1222 if DEBUG: print("reqd: 0x%04x" % rqd_stream, file=self.logfile)
1223 def bof_error(msg):
-> 1224 raise XLRDError('Unsupported format, or corrupt file: ' + msg)
1225 savpos = self._position
1226 opcode = self.get2bytes()
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '<?xml ve'
</code></pre>
<p>我试过其他的方法:</p>
<pre><code>df = pd.ExcelFile("coalpublic2012.xls", encoding_override='cp1252')
import xlrd
wb = xlrd.open_workbook("coalpublic2012.xls")
</code></pre>
<p>无济于事。我的熊猫版:0.17.0</p>
<p>我还将此作为一个bug提交给了pandas github<a href="https://github.com/pydata/pandas/issues/11503" rel="noreferrer">issues</a>列表。</p>