BeautifulSoup scraping返回没有d的{{}}

2024-09-29 06:27:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析一个如下所示的站点:

    <div class="address">
    <div class="hit-company"><a href="https://www.cools.biz/best/celebrities/amy-gold/">Amy  Gold</a></div>
    <div class="speciality hit-speciality">Audiology</div>
    <div class="address hit-address"><i><p translate="no">
    <span class="address-line1">38 Park Drive </span><br>
    <span class="locality">London</span>, <span class="administrative-area">VA</span> <span class="postal-code">22025</span><br>
    </p></i></div>
    <div class="phone hit-phone"><i><a href="tel:+1-xxx-659-xxx">(xxx) 659-xxx</a></i></div>
    <div class="description hit-listing_description hidden-xs"></div>
    <div class="hit-website"><a href="http://coll celebs.com" target="_blank">Visit Website</a></div>
    </div>

用漂亮的汤刮这个:`

import os
from urllib.request import Request, urlretrieve, urlopen
from bs4 import BeautifulSoup
req = Request("https://www.urlxxxxxx.com", headers={'User-Agent': 'Mozilla/5.0'}) 
page1 = urlopen(req)
phtml = BeautifulSoup(page1, 'html5lib') print(phtml)
divs = phtml.find_all("div", attrs={"class":"hit-company"})
print('aaaaa-----' + str(divs))`

用html5lib,lxml,html.parser语法分析器. lxml和html.parser语法分析器甚至不要选择div类“hit company”,只有html5lib可以。即使有了html5lib,div也变成了一个空的。你知道吗

当我检查html输出时,我注意到

<div class="hit-company{{person}}</div>
<div class="speciality hit-speciality">{{specialty}}</div>
<span class="address-line1">{{address}}</span><br>

实际数据由{paratemerx}放置。你能帮我解决这个问题吗?你知道吗

谢谢


Tags: httpsbrimportdivaddresshtmlcompanyclass