擅长:python、mysql、java
<p>当前的HTML结构非常通用,它包含多个<code>infoEntity</code>div,子内容可以通过多种方式格式化。要处理此问题,可以迭代<code>infoEntity</code>div并应用格式化对象,如下所示:</p>
<pre><code>from bs4 import BeautifulSoup as soup
result, label = {}, None
for i in soup(car, 'html.parser').find_all('div', {'class':'infoEntity'}):
for b in i.find_all(['span', 'label']):
if b.name == 'label':
label = b.get_text(strip=True)
elif b.name == 'span' and label is not None:
result[label] = b.get_text(strip=True)
label = None
else:
result[' '.join(b['class'])] = b.get_text(strip=True)
</code></pre>
<p>输出:</p>
<pre><code>{'manufacturer website': 'www.ford.com', 'Headquarters': 'Dearbord, MI', 'Model': 'Mustang'}
</code></pre>