<p>您不能直接打印整个数据,因为单击<code>Show all</code>按钮后可以看到完整的数据。因此,从这个场景中,我们可以理解,首先我们必须创建一个<code>on click()</code>事件来单击<code>Show all</code>按钮,然后才能获取整个表</p>
<p>我已经为<code>on click</code>事件使用了<code>Selenium</code>库来按下<code>Show all</code>按钮。对于这个特定场景,我使用了<code>Selenium</code>的<code>Firefox() Webdriver</code>从<code>url</code>获取所有<code>data</code>。请参考下面给出的代码获取给定<code>COVID Dataset URL</code>的整个表:</p>
<pre><code># Import all the Important Libraries
from selenium import webdriver # This module help to fetch data and on-click event purpose
from pandas.io.html import read_html # This module will help to read 'html' source. So, we can __scrape__ data from it
import pandas as pd # This Module will help to Convert Our Data into 'DataFrame'
# Create 'FireFox' Webdriver Object
driver = webdriver.Firefox()
# Get Website
driver.get("https://www.nytimes.com/interactive/2021/world/covid-vaccinations-tracker.html")
# Find 'Show all' Button Using 'XPath'
show_all_button = driver.find_element_by_xpath("/html/body/div[1]/main/article/section/div/div/div[4]/div[1]/div/table/tbody/tr[16]")
# Click 'Show all' Button
show_all_button.click()
# Get 'HTML' Content of Page
html_data = driver.page_source
</code></pre>
<p>在获取整个数据之后,让我们看看<code>COVID Dataset URL</code>中有多少个表</p>
<pre><code>covid_data_tables = read_html(html_data, attrs = {"class":"g-summary-table svelte-2wimac"}, header = None)
# Print Number of Tables Extracted
print ("\nExtracted {num} COVID Data Table".format(num = len(covid_data_tables)), "\n")
</code></pre>
<pre><code># Output of Above Cell:-
Extracted 1 COVID Data Table
</code></pre>
<p>现在,让我们获取数据表:-</p>
<pre><code># Print Table Data
covid_data_tables[0].head(20)
</code></pre>
<pre><code># Output of above cell:-
Unnamed: 0_level_0 Doses administered Pct. of population
Unnamed: 0_level_1 Per 100 people Total Vaccinated Fully vaccinated
0 World 11 877933955 – –
1 Israel 116 10307583 60% 56%
2 Seychelles 116 112194 68% 47%
3 U.A.E. 99 9489684 – –
4 Chile 69 12934282 41% 28%
5 Bahrain 66 1042463 37% 29%
6 Bhutan 63 478219 63% –
7 U.K. 62 41505768 49% 13%
8 United States 61 202282923 38% 24%
9 San Marino 60 20424 35% 25%
10 Maldives 59 303752 53% 5.6%
11 Malta 55 264658 38% 17%
12 Monaco 53 20510 30% 23%
13 Hungary 45 4416581 32% 14%
14 Serbia 44 3041740 26% 17%
15 Qatar 43 1209648 – –
16 Uruguay 38 1310591 30% 8.3%
17 Singapore 30 1667522 20% 9.5%
18 Antigua and Barbuda 28 27032 28% –
19 Iceland 28 98672 20% 8.1%
</code></pre>
<p>正如您所看到的,它没有在我们的数据集中显示<code>show all</code>。现在我们可以把这个<code>Data Table</code>转换成<code>DataFrame</code>。为了完成这个任务,我们必须将这个<code>Data</code>存储为<code>CSV</code>格式,我们可以重新加载它并将它存储在<code>DataFrame</code>。其代码如下所述:</p>
<pre><code># HTML Table to CSV Format Conversion For COVID Dataset
covid_data_file = 'covid_data.csv'
covid_data_tables[0].to_csv(covid_data_file, sep = ',')
# Read CSV Data From Data Table for Further Analysis
covid_data = pd.read_csv("covid_data.csv")
</code></pre>
<p>因此,在将所有<code>Data</code>存储为<code>csv</code>格式之后,让我们将数据转换为<code>DataFrame</code>格式并打印整个数据:-</p>
<pre><code># Store 'CSV' Data into 'DataFrame' Format
vaccineDF = pd.DataFrame(covid_data)
vaccineDF = vaccineDF.drop(columns=["Unnamed: 0"], axis = 1) # 'drop' Unneccesary Columns from the Dataset
# Print Whole Dataset
vaccineDF
</code></pre>
<pre><code># Output of above cell:-
Unnamed: 0_level_0 Doses administered Doses administered.1 Pct. of population Pct. of population.1
0 Unnamed: 0_level_1 Per 100 people Total Vaccinated Fully vaccinated
1 World 11 877933955 – –
2 Israel 116 10307583 60% 56%
3 Seychelles 116 112194 68% 47%
4 U.A.E. 99 9489684 – –
... ... ... ... ... ...
154 Syria <0.1 2500 <0.1% –
155 Papua New Guinea <0.1 1081 <0.1% –
156 South Sudan <0.1 947 <0.1% –
157 Cameroon <0.1 400 <0.1% –
158 Zambia <0.1 106 <0.1% –
159 rows × 5 columns
</code></pre>
<p>从上面的输出可以看出,我们已经成功地获取了整个<code>data table</code>。希望这个解决方案能对您有所帮助</p>