<p>表值似乎嵌入了一个JSON字符串中,可以使用<a href="https://docs.python.org/3/library/json.html#json.loads" rel="nofollow noreferrer">json.loads</a>获得该字符串。然后我们可以通过指示国家地区的<code>"regional"</code>键来获得值。在</p>
<p>它有点复杂,但至少它得到了我们放入数据帧中的值,如下所示:</p>
<pre><code>import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
import os
import numpy as np
# force maximum dataframe column width
pd.set_option('display.max_colwidth', 0)
url = 'https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/'
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
tables = soup.find_all('div', {'class': 'row row-size3 column'})
region = 'us-west-2' # Adjust your region here
def parse_table_as_dataframe(table):
data = []
header = []
c5 = c6 = c7 = c8 = []
rows = []
columns = []
name = table.h3.text
try:
# This part gets the first word in each column header so the table
# fits reasonably in the display, adjust to your preference
header = [h.text.split()[0].strip() for h in table.thead.find_all('th')][1::]
except AttributeError:
return 'N/A'
for row in table.tbody.find_all('tr'):
for c in row.find_all('td')[1::]:
if c.text.strip() not in (u'', u'$-') :
if 'dash' in c.text.strip():
columns.append('-') # replace "&dash; &dash:" with a `-`
else:
columns.append(c.text.strip())
else:
try:
data_text = c.span['data-amount']
# data = json.loads(data_text)['regional']['asia-pacific-southeast']
data = json.loads(data_text)['regional'][region]
columns.append(data)
except (KeyError, TypeError):
columns.append('N/A')
num_rows = len(table.tbody.find_all('tr'))
num_columns = len(header)
# For debugging
# print(len(columns), columns)
# print(num_rows, num_columns)
df = pd.DataFrame(np.array(columns).reshape(num_rows, num_columns), columns=header)
return df
for n, table in enumerate(tables):
print(n, table.h3.text)
print(parse_table_as_dataframe(table))
</code></pre>
<p>从页中获取24个数据帧,每个表对应一个:</p>
^{pr2}$