从dropbox链接解析.xls文件

2024-10-01 11:32:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从dropbox链接(https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx)解析一个表。这是一个.xlsx表,到目前为止,我已经尝试了两种方法

方法1

codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'

tables = pd.read_html(codeID_url)
df_codeID = tables[0]

给予

ValueError: No tables found

这是有道理的,因为最后,我不是在解析html页面中的表。上面的命令对于本页(https://www.ecdc.europa.eu/en/covid-19/variants-concern)中的表非常有效

方法2

codeID_url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx'
data = pd.read_excel(codeID_url,'TestResultCodelistVoC')

给出:

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'<!DOCTYP'

我确实在这个错误上找到了一个主题here,尽管所有的答案都是处理一个本地.xls文件,在我的例子中,我试图解析一个网页/链接,它最终是一个.xls文件

我还遇到了一个使用dropbox token的解决方案,尽管我首先想尝试在不使用dropbox帐户的情况下下载前面提到的表,如果可能的话


Tags: 方法httpscomurlreadtables链接html
1条回答
网友
1楼 · 发布于 2024-10-01 11:32:22

?dl=1添加到URL的末尾

>>> import pandas as pd
>>>
>>> url = 'https://www.dropbox.com/s/i77mern7joxc9ur/TestResultCodelistVoC.xlsx?dl=1'
>>> df = pd.read_excel(url)
>>> print(df)
             Codelistname  Codesystem name  ...                                     Short label DE 1st Release
0   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V1         NaN
1   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V2         NaN
2   TestResultCodelistVoC              NaN  ...                                  Confirmed 501Y.V3         NaN
3   TestResultCodelistVoC              NaN  ...                               Confirmed 501Y.V3.P1         NaN
4   TestResultCodelistVoC              NaN  ...                               Confirmed 501Y.V3.P2         NaN
5   TestResultCodelistVoC              NaN  ...                Confirmed not one of the listed VOC         NaN
6   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V1         NaN
7   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V2         NaN
8   TestResultCodelistVoC              NaN  ...                            Compatible with 501Y.V3         NaN
9   TestResultCodelistVoC              NaN  ...                         Compatible with 501Y.V3.P1         NaN
10  TestResultCodelistVoC              NaN  ...                         Compatible with 501Y.V3.P2         NaN
11  TestResultCodelistVoC              NaN  ...                          Compatible with 501Y.V2-3         NaN
12  TestResultCodelistVoC              NaN  ...                              Compatible with a VOC         NaN
13  TestResultCodelistVoC              NaN  ...                             Confirmed MinkCluster5         NaN
14  TestResultCodelistVoC              NaN  ...                       Compatible with MinkCluster5         NaN
15  TestResultCodelistVoC              NaN  ...                        Not compatible with 501Y.V1         NaN
16  TestResultCodelistVoC              NaN  ...                      Not compatible with 501Y.V2-3         NaN
17  TestResultCodelistVoC              NaN  ...  No compatibility with VOC detected (VOC not fu...         NaN
18  TestResultCodelistVoC              NaN  ...                           Other variant of concern         NaN

[19 rows x 12 columns]
>>>

相关问题 更多 >