Web刮削循环结构问题

2024-09-28 01:33:22 发布

您现在位置:Python中文网/ 问答频道 /正文

作为一个实践项目,我目前正在为AutoTrader的web scrape编写一些代码。我打印所需的结果有困难。你知道吗

所需输出应为:

Car 1
Specs Car 1

相反,它是

Car 1
Specs Car 1
Specs Car 2
Specs Car X

car 2

我的循环结构哪里出错了?你知道吗

from bs4 import BeautifulSoup 
import requests

page_link = ("https://www.autotrader.co.uk/car-search?sort=price-asc&radius=1500&postcode=lu15jq&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&make=AUDI&model=A5&price-to=8500&year-from=2008&maximum-mileage=90000&transmission=Automatic&exclude-writeoff-categories=on")
LN = 0
r = requests.get(page_link)
c = r.content
soup = BeautifulSoup(c,"html.parser")

all = soup.find_all("h2",{"class":"listing-title title-wrap"})
all2 = soup.find_all('ul',{"class" :'listing-key-specs '})

上面的代码块很好。下面的块打印输出。你知道吗

LN = -1
ListTotal = len(all)
for item in all:
    if LN <= ListTotal:
        LN += 1
        print(item.find("a", {"class": "js-click-handler listing-fpa-link"}).text)
        for carspecs in all2:
            print (carspecs.text)
    else:
        break

谢谢


Tags: 代码fromimportlinkallfindcarclass
1条回答
网友
1楼 · 发布于 2024-09-28 01:33:22

因为你每次都在打印

all = ...
all2 = ...

for item in all:
    ...
    for carspecs in all2:  
            # will print everything in all2 on each iteration of all
            print (carspecs.text)

我猜你想

for item, specs in zip(all, all2):
    ...
    print(specs.text)

仅供参考,我用更好的逻辑和名称清理了代码,去掉了多余的东西,并使其服从python style guide

import requests
from bs4 import BeautifulSoup

page_link = ("https://www.autotrader.co.uk/car-search?sort=price-asc&"
             "radius=1500&postcode=lu15jq&onesearchad=Used&"
             "onesearchad=Nearly%20New&onesearchad=New&make=AUDI&model=A5"
             "&price-to=8500&year-from=2008&maximum-mileage=90000"
             "&transmission=Automatic&exclude-writeoff-categories=on")

request = requests.get(page_link)
conn = request.content
soup = BeautifulSoup(conn, "html.parser")

# don't overload the inbuilt `all`
cars = soup.find_all("h2", {"class":"listing-title title-wrap"})
cars_specs = soup.find_all('ul', {"class" :'listing-key-specs '})

for car, specs in zip(cars, cars_specs):
    # your logic with regards to the `LN` variable did absolutely nothing
    print(car.find("a", {"class": "js-click-handler listing-fpa-link"}))
    print(specs.text)

相关问题 更多 >

    热门问题