Python如何使用带有随机类字符的soup

2024-06-16 13:50:07 发布

您现在位置:Python中文网/ 问答频道 /正文

因此,我一直在尝试找出如何从一个网站中搜寻购买/出售网站,我来到了一个地方,在那里我发现了HTML中的所有内容,但类包含不同的随机数,例如:

<div aria-label="Adidas NMD x Bape" class="styled__Wrapper-sc-1kpvi4z-0 eDiSuB" to="/annons/skane/adidas_nmd_x_bape/87267675">
    <article class="styled__Article-sc-1kpvi4z-1 hbWRzz">
        <div class="styled__ImageWrapper-sc-1kpvi4z-4 kxhCJn">
            <div class="ListImage__Wrapper-sc-1rp77jc-0 cvipJS"><img alt="Adidas NMD x Bape" class="ListImage__StyledImg-sc-1rp77jc-1 iwClwW" sizes="
              (min-width: 768px) 180px,
              120px
            " src="https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big" srcset="
    https://cdn.blocket.com/pictures/1692451915.jpg?type=thumb 120w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big 180w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal 240w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=store_presentation 360w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal_retina 540w,
  " /></div>
        </div>
        <div class="styled__Content-sc-1kpvi4z-2 dwtNsH">
            <div class="styled__LocationTimeWrapper-sc-1kpvi4z-17 dvvNDw">
                <div class="styled__SubjectSymbol-sc-1kpvi4z-11 cbBbUz"></div>
                <p class="styled__TopInfoWrapper-sc-1kpvi4z-22 kEcJNb"><a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/hela_sverige/personligt/klader_skor?cg=4080&amp;q=bape&amp;st=s">Kläder &amp; skor</a> · <a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/skane/personligt/klader_skor?cg=4080&amp;q=bape&amp;r=23&amp;st=s">Skåne</a></p>
                <p class="styled__Time-sc-1kpvi4z-18 bGSnhf">Idag 14:06</p>
            </div>
            <div class="styled__SubjectWrapper-sc-1kpvi4z-10 kZyTSM">
                <h2 class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq styled__StyledTitle-sc-1kpvi4z-6 bSElwy"><a class="Link-sc-139ww1j-0 styled__StyledTitleLink-sc-1kpvi4z-7 edlhAW" href="/annons/skane/adidas_nmd_x_bape/87267675">Adidas NMD x Bape</a></h2></div>
            <div class="styled__ParamsWrapper-sc-1kpvi4z-13 cRZIFG"></div>
            <div class="styled__SalesInfo-sc-1kpvi4z-20 bbHjGJ">
                <div class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq Price__Wrapper-sc-1v2maoc-0 heunWX"><span>3 000 kr<div class="TextCallout2__TextCallout2Wrapper-sc-19qvftl-0 eERYUj Price__StyledVatPrice-sc-1v2maoc-1 hMWxAJ"></div></span></div>
            </div>
        </div>
    </article>
</div>

我确实看到了我正在寻找的所有标签,例如:

    Adidas NMD x Bape
    3 000 kr
    Skåne
    /annons/skane/adidas_nmd_x_bape/87267675
    https://cdn.blocket.com/pictures/1692451915.jpg

我有一个相当的知识,汤和如何刮基本,但当它到了这个先进的,然后我是我的头脑,所以我在这里问什么样的提示你们可以提供我,我可以刮这些价值观,我正在寻找


更新

test = eachPart.select_one('h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a').text
print(test)
print(eachPart.select_one('[aria-label="{}"] img[alt="{}"]'.format(test, test))['src'])
print(eachPart.select_one('h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a')['href'])
print(eachPart.select_one('div[class^="TextSubHeading__TextSubHeadingWrapper"] >span').text)
for test in eachPart.select('p[class^="styled__TopInfoWrapper"] a')[1:]:
    print(test.text)

Tags: httpstestdivcomtypecdnclassamp
1条回答
网友
1楼 · 发布于 2024-06-16 13:50:07

首先标识父标记以查找主标记,然后查找所有子标记。 使用更方便的CSS选择器

from bs4 import BeautifulSoup
html='''<div aria-label="Adidas NMD x Bape" caria-label="Adidas NMD x Bape"lass="styled__Wrapper-sc-1kpvi4z-0 eDiSuB" to="/annons/skane/adidas_nmd_x_bape/87267675">
    <article class="styled__Article-sc-1kpvi4z-1 hbWRzz">
        <div class="styled__ImageWrapper-sc-1kpvi4z-4 kxhCJn">
            <div class="ListImage__Wrapper-sc-1rp77jc-0 cvipJS"><img alt="Adidas NMD x Bape" class="ListImage__StyledImg-sc-1rp77jc-1 iwClwW" sizes="
              (min-width: 768px) 180px,
              120px
            " src="https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big" srcset="
    https://cdn.blocket.com/pictures/1692451915.jpg?type=thumb 120w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big 180w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal 240w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=store_presentation 360w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal_retina 540w,
  " /></div>
        </div>
        <div class="styled__Content-sc-1kpvi4z-2 dwtNsH">
            <div class="styled__LocationTimeWrapper-sc-1kpvi4z-17 dvvNDw">
                <div class="styled__SubjectSymbol-sc-1kpvi4z-11 cbBbUz"></div>
                <p class="styled__TopInfoWrapper-sc-1kpvi4z-22 kEcJNb"><a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/hela_sverige/personligt/klader_skor?cg=4080&amp;q=bape&amp;st=s">Kläder &amp; skor</a> · <a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/skane/personligt/klader_skor?cg=4080&amp;q=bape&amp;r=23&amp;st=s">Skåne</a></p>
                <p class="styled__Time-sc-1kpvi4z-18 bGSnhf">Idag 14:06</p>
            </div>
            <div class="styled__SubjectWrapper-sc-1kpvi4z-10 kZyTSM">
                <h2 class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq styled__StyledTitle-sc-1kpvi4z-6 bSElwy"><a class="Link-sc-139ww1j-0 styled__StyledTitleLink-sc-1kpvi4z-7 edlhAW" href="/annons/skane/adidas_nmd_x_bape/87267675">Adidas NMD x Bape</a></h2></div>
            <div class="styled__ParamsWrapper-sc-1kpvi4z-13 cRZIFG"></div>
            <div class="styled__SalesInfo-sc-1kpvi4z-20 bbHjGJ">
                <div class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq Price__Wrapper-sc-1v2maoc-0 heunWX"><span>3 000 kr<div class="TextCallout2__TextCallout2Wrapper-sc-19qvftl-0 eERYUj Price__StyledVatPrice-sc-1v2maoc-1 hMWxAJ"></div></span></div>
            </div>
        </div>
    </article>
</div>'''
soup=BeautifulSoup(html,"html.parser")
print(soup.select_one('[aria-label="Adidas NMD x Bape"] img[alt="Adidas NMD x Bape"]')['src'])
print(soup.select_one('[aria-label="Adidas NMD x Bape"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a').text)
print(soup.select_one('[aria-label="Adidas NMD x Bape"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a')['href'])
print(soup.select_one('[aria-label="Adidas NMD x Bape"] div[class^="TextSubHeading__TextSubHeadingWrapper"] >span').text)

输出

https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big
Adidas NMD x Bape
/annons/skane/adidas_nmd_x_bape/87267675
3 000 kr

编辑

from bs4 import BeautifulSoup
html='''<div aria-label="Adidas NMD x Bape" class="styled__Wrapper-sc-1kpvi4z-0 eDiSuB" to="/annons/skane/adidas_nmd_x_bape/87267675">
    <article class="styled__Article-sc-1kpvi4z-1 hbWRzz">
        <div class="styled__ImageWrapper-sc-1kpvi4z-4 kxhCJn">
            <div class="ListImage__Wrapper-sc-1rp77jc-0 cvipJS"><img alt="Adidas NMD x Bape" class="ListImage__StyledImg-sc-1rp77jc-1 iwClwW" sizes="
              (min-width: 768px) 180px,
              120px
            " src="https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big" srcset="
    https://cdn.blocket.com/pictures/1692451915.jpg?type=thumb 120w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=gallery_big 180w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal 240w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=store_presentation 360w,
    https://cdn.blocket.com/pictures/1692451915.jpg?type=mob_iphone_vi_normal_retina 540w,
  " /></div>
        </div>
        <div class="styled__Content-sc-1kpvi4z-2 dwtNsH">
            <div class="styled__LocationTimeWrapper-sc-1kpvi4z-17 dvvNDw">
                <div class="styled__SubjectSymbol-sc-1kpvi4z-11 cbBbUz"></div>
                <p class="styled__TopInfoWrapper-sc-1kpvi4z-22 kEcJNb"><a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/hela_sverige/personligt/klader_skor?cg=4080&amp;q=bape&amp;st=s">Kläder &amp; skor</a> · <a class="Link-sc-139ww1j-0 TopInfoLink__StyledLink-lzfj8j-0 bjnLor" href="/annonser/skane/personligt/klader_skor?cg=4080&amp;q=bape&amp;r=23&amp;st=s">Skåne</a></p>
                <p class="styled__Time-sc-1kpvi4z-18 bGSnhf">Idag 14:06</p>
            </div>
            <div class="styled__SubjectWrapper-sc-1kpvi4z-10 kZyTSM">
                <h2 class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq styled__StyledTitle-sc-1kpvi4z-6 bSElwy"><a class="Link-sc-139ww1j-0 styled__StyledTitleLink-sc-1kpvi4z-7 edlhAW" href="/annons/skane/adidas_nmd_x_bape/87267675">Adidas NMD x Bape</a></h2></div>
            <div class="styled__ParamsWrapper-sc-1kpvi4z-13 cRZIFG"></div>
            <div class="styled__SalesInfo-sc-1kpvi4z-20 bbHjGJ">
                <div class="TextSubHeading__TextSubHeadingWrapper-sc-1ilszdp-0 jIvScq Price__Wrapper-sc-1v2maoc-0 heunWX"><span>3 000 kr<div class="TextCallout2__TextCallout2Wrapper-sc-19qvftl-0 eERYUj Price__StyledVatPrice-sc-1v2maoc-1 hMWxAJ"></div></span></div>
            </div>
        </div>
    </article>
</div>'''
soup=BeautifulSoup(html,"html.parser")
print(soup.select_one('[class^="styled__Wrapper-sc-"] img[class^="ListImage__StyledImg-sc-"]')['src'])
print(soup.select_one('[class^="styled__Wrapper-sc-"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a').text)
print(soup.select_one('[class^="styled__Wrapper-sc-"] h2[class^="TextSubHeading__TextSubHeadingWrapper"] >a')['href'])
print(soup.select_one('[class^="styled__Wrapper-sc-"] div[class^="TextSubHeading__TextSubHeadingWrapper"] >span').text)

相关问题 更多 >