无法从BeautifulSoup获取响应内容

2024-10-01 02:35:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我曾尝试使用BeautifulSoup从CarGurus检索Mercedes-C-class的数据,如:

url1 = https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?
       &showNegotiable=true&sourceContext=carGurusHomePageModel
       &entitySelectingHelper.selectedEntity2=c21239
       &entitySelectingHelper.selectedEntity=c6079

url2 = https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?
       &showNegotiable=true&sourceContext=carGurusHomePageModel
       &entitySelectingHelper.selectedEntity2=c21239
       &entitySelectingHelper.selectedEntity=c6079#listing=260322671_isFeatured

response1 = requests.get(url1)
response2 = requests.get(url2)

注:url2是页面url1上显示的第一项的链接 (后缀为#listing=260322671_isFeatured),有很多细节我想略过

但是response1.contentresponse2.content最终得到了完全相同的内容。 我尝试过不同的页面和不同的车型,当我使用bs4时,所有的结果都是一样的

顺便说一句,我正在使用MacBook,我在某个地方读过关于在MacOS上使用WebDriver的文章,比如

driver = webdriver.Safari()
driver.get(URL)

只有这样,我才能访问特定的项目页面,但会话将被锁定,这意味着我不能使用循环一次又一次地访问多个页面。。。所以我回到了bs4,知道吗


Tags: httpscomtruegetwwwaction页面cars
1条回答
网友
1楼 · 发布于 2024-10-01 02:35:18

数据通过Ajax/Json动态加载。但是检查页面在何处建立连接,我们可以使用requests模拟它们:

url = '''https://www.cargurus.com/Cars/inventorylisting/viewDetailsFilterViewInventoryListing.action?
       &showNegotiable=true&sourceContext=carGurusHomePageModel
       &entitySelectingHelper.selectedEntity2=c21239
       &entitySelectingHelper.selectedEntity=c6079'''

listing_detail_url = 'https://www.cargurus.com/Cars/detailListingJson.action?inventoryListing={}&searchZip=&searchDistance=100&inclusionType=DEFAULT'

import json
import requests
from bs4 import BeautifulSoup

soup = BeautifulSoup(requests.get(url).text, 'html.parser')

data = []
for a in soup.select('a[href^="#listing"]'):  # get all listings on the page
    listing_id = a['href'].split('=')[-1]

    json_data = requests.get(listing_detail_url.format(listing_id)).json()
    # print(json.dumps(json_data, indent=4)) # <  uncomment this to print all data

    listing_title = json_data['listing']['listingTitle']
    price = json_data['listing']['price']
    make_name = json_data['listing']['makeName']
    model_name = json_data['listing']['modelName']
    # ... other data

    data.append( (listing_title, price, make_name, model_name ) )

# print the data
print('{:<80} {:<30} {:<30} {:<30}'.format('Title', 'Price', 'Brand', 'Model'))
for row in data:
    print('{:<80} {:<30} {:<30} {:<30}'.format(*row))

印刷品:

Title                                                                            Price                          Brand                          Model                         
2009 Mercedes-Benz C-Class C 300 Sport - $8,000                                  8000.0                         Mercedes-Benz                  C-Class                       
2008 Mercedes-Benz C-Class C 300 Luxury - $3,500                                 3500.0                         Mercedes-Benz                  C-Class                       
2009 Mercedes-Benz C-Class C 300 Sport - $5,999                                  5999.0                         Mercedes-Benz                  C-Class                       
2007 Mercedes-Benz C-Class C 280 4MATIC Luxury AWD - $1,975                      1975.0                         Mercedes-Benz                  C-Class                       
2007 Mercedes-Benz C-Class C 230 Sport - $2,499                                  2499.0                         Mercedes-Benz                  C-Class                       
2009 Mercedes-Benz C-Class - $5,299                                              5299.0                         Mercedes-Benz                  C-Class                       
2009 Mercedes-Benz C-Class C 300 4MATIC Luxury - $6,499                          6499.0                         Mercedes-Benz                  C-Class                       
2008 Mercedes-Benz C-Class C 300 Luxury - $5,950                                 5950.0                         Mercedes-Benz                  C-Class                       
2008 Mercedes-Benz C-Class C 300 Luxury 4MATIC - $6,650                          6650.0                         Mercedes-Benz                  C-Class                       
2005 Mercedes-Benz C-Class C 230 Kompressor Supercharged Sedan - $2,995          2995.0                         Mercedes-Benz                  C-Class                       
2007 Mercedes-Benz C-Class C 230 Sport - $4,900                                  4900.0                         Mercedes-Benz                  C-Class                       
2008 Mercedes-Benz C-Class C 350 Sport - $6,400                                  6400.0                         Mercedes-Benz                  C-Class                       
2009 Mercedes-Benz C-Class C 300 4MATIC Luxury - $6,900                          6900.0                         Mercedes-Benz                  C-Class                       
2008 Mercedes-Benz C-Class C 300 Sport - $6,200                                  6200.0                         Mercedes-Benz                  C-Class                       
2007 Mercedes-Benz C-Class C 280 4MATIC Luxury AWD - $3,830                      3830.0                         Mercedes-Benz                  C-Class                       

相关问题 更多 >