一个团队成员每天从Oracle上的不同报告中提取几个报告,并将它们转储到各自的单页.xlsx
文件中,这样他就可以用Excel打开这些报告并进行一些清理。我想用Pandas自动化整个任务,但是我还不能用Python提供的任何库打开下载的文件
当我尝试用Pandas打开文件时,XLRD抛出以下错误:
XLRDError Traceback (most recent call last)
<ipython-input-19-0414e67ce665> in <module>
----> 1 df = pd.read_excel("small_data_samples/ruben/Actividades-Conectar Arreglos Pymes_30_07_19.xlsx")
~/.local/share/virtualenvs/datas--Z8piCS3/lib/python3.6/site-packages/xlrd/__init__.py in open_workbook(filename, logfile, verbosity, use_mmap, file_contents, encoding_override, formatting_info, on_demand, ragged_rows)
143 if 'content.xml' in component_names:
144 raise XLRDError('Openoffice.org ODS file; not supported')
--> 145 raise XLRDError('ZIP file contents not a known type of workbook')
146
147 from . import book
XLRDError: ZIP file contents not a known type of workbook
我也尝试过使用Openpyxl,但没有更好的运气:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-18-a458f5d28de0> in <module>
----> 1 book = openpyxl.load_workbook("small_data_samples/ruben/Actividades-Conectar Arreglos Pymes_30_07_19.xlsx")
~/.local/share/virtualenvs/datas--Z8piCS3/lib/python3.6/site-packages/openpyxl/reader/excel.py in load_workbook(filename, read_only, keep_vba, data_only, guess_types, keep_links)
221 ws._rels = rels
222 ws_parser = WorkSheetParser(ws, fh, shared_strings)
--> 223 ws_parser.parse()
224
225 if rels:
~/.local/share/virtualenvs/datas--Z8piCS3/lib/python3.6/site-packages/openpyxl/reader/worksheet.py in parse(self)
128 tag_name = element.tag
129 if tag_name in dispatcher:
--> 130 dispatcher[tag_name](element)
131 element.clear()
132 elif tag_name in properties:
~/.local/share/virtualenvs/datas--Z8piCS3/lib/python3.6/site-packages/openpyxl/reader/worksheet.py in parse_row(self, row)
290
291 for cell in safe_iterator(row, self.CELL_TAG):
--> 292 self.parse_cell(cell)
293
294
~/.local/share/virtualenvs/datas--Z8piCS3/lib/python3.6/site-packages/openpyxl/reader/worksheet.py in parse_cell(self, element)
209 if style_id is not None:
210 style_id = int(style_id)
--> 211 style_array = self.styles[style_id]
212
213 if coordinate:
IndexError: list index out of range
我还尝试使用ZipFile库打开文件并提取所需的.xml内容,在那里我发现:
[Content_Types].xml
_rels/
_rels/.rels
_rels/workbook.xml.rels
sheet1.xml
styles.xml
workbook.xml
我能够确定我正在寻找的内容,但这是一个非常沉重和复杂的问题,我想避免这样做,除非没有更好的方法
到目前为止,我还不能用Python打开这个文件,但是我可以在Windows和Linux下用Excel和LibreOffice打开这个文件。如果我这样做并再次保存文件,那么我就可以用Pandas直接用XLRD和Openpyxl打开它
目前没有回答
相关问题 更多 >
编程相关推荐