无法使用BeautifulSoup刮取标记，因为使用请求登录不起作用

2024-09-29 22:34:05 发布

男 | 程序猿一只，喜欢编程写python代码。

为了提高价格，你必须登录。这过去是可行的，但现在他们在网站上做了一些改变。下面的代码仍然适用于URL、拍卖、标题和结果，但只适用于返回的价格。如果结果列表中的值为“批量竞价继续”或“批量售出”，则车辆应具有价格价值

import pandas as pd
import requests
from bs4 import BeautifulSoup
import re
import numpy as np
from multiprocessing import Pool 
from multiprocessing import cpu_count
from IPython.core.interactiveshell import InteractiveShell

# Display all output
InteractiveShell.ast_node_interactivity = "all"
pd.options.display.max_rows = 100
pd.options.display.max_colwidth = 100
pd.options.display.max_columns = None

# Scrape all vehicles per auction
data_list = [{"searchScope": "SC0520", #value options unique per auction (SC0520 = Indy 2020)
    "searchMake": "Plymouth",
    "searchModel": "Cuda",
    "searchYearStart": "1970",
    "searchYearEnd": "1971",
    "submit": ""},{"searchScope": "SC0520",
    "searchMake": "Dodge",
    "searchModel": "Challenger",
    "searchYearStart": "1970",
    "searchYearEnd": "1971",
    "submit": ""}]

headers = {
    "Referer": "https://www.mecum.com",
}

login = {"email": "arjenvgeffen@gmail.com",
        "password": "appeltaart13"}

# Get all the newest challenger and cuda lots with the function below
urls = []
title = []
auction = []
results = []
price = []

def newest_vehicles(url):
    with requests.Session() as req:
        r = req.post("https://www.mecum.com/includes/login-action.cfm", data=login)
        for data in data_list:
            for item in range(1, 2): #scrapes one page 
                r = req.post(url.format(item), data=data, headers=headers)
                soup = BeautifulSoup(r.content, 'html.parser')
                target = soup.select("div.lot")
                for tar in target:
                    urls.append(tar.a.get('href'))
                    title.append(tar.select_one("a.lot-title").text)
                    price.append(tar.span.text if tar.span.text else np.NaN)
                    auction.append(tar.select_one("div.lot-number").text.strip())
                    results2 = tar.select("div[class*=lot-image-container]")
                    for result2 in results2:
                        results.append(' '.join(result2['class']))

newest_vehicles("https://www.mecum.com/search/page/{}/")

# There should be 27 unique URLSs
len(urls) #27
len(set(urls)) #27

urls[:2]
title[:2]
results[:2]
auction[:2]
price[:2]

额外问题（除了你的自豪感和很可能被接受的答案之外，你什么都赢不了：）如果我已经有了一个包含URL的列表，我如何使用这些URL作为函数的输入来获取每个URL的价格。下面的示例粗略估计了每个URL（最终URL列表）。我想一个类似的功能，刮每个网址的价格，但这将需要一些额外的代码登录第一。您可以像这样刮取价格：price=soup.find（“span”，class=（“lot price”）。text

final_urls = ['https://www.mecum.com/lots/SC0520-414334/1970-plymouth-cuda/',
 'https://www.mecum.com/lots/SC0520-414676/1970-plymouth-aar-cuda/',
 'https://www.mecum.com/lots/SC0520-414677/1971-plymouth-cuda-convertible/',
 'https://www.mecum.com/lots/SC0520-414678/1971-plymouth-cuda-convertible/',
 'https://www.mecum.com/lots/SC0520-414733/1971-plymouth-cuda/']

estimate = []
def scrape_estimate(url):
    with requests.Session() as req:
            estimate = []
            r = req.get(url)
            soup = BeautifulSoup(r.content, 'html.parser')    
            est = soup.find(class_=["lot-estimate"])
            if est:
                estimate = est.contents[0]
                estimate = re.sub("[\$\,\n\t\' ']", "", estimate)
            else:
                estimate = np.NaN

            return(estimate)

# Example to check just one URL 
scrape_estimate('https://www.mecum.com/lots/FL0120-397356/1971-plymouth-hemi-cuda-sox-and-martin-pro-stock/')

# Scrapes all the URLs in the final_urls list
p = Pool(cpu_count())
results_estimate = p.map(scrape_estimate, final_urls)#sample_urls
p.close()
p.join()

results_estimate

我真的希望有人能帮我解决这个问题，谢谢

Tags： https import com url data www 价格 tar

0条回答

目前没有回答

无法使用BeautifulSoup刮取标记，因为使用请求登录不起作用

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法使用BeautifulSoup刮取标记，因为使用请求登录不起作用

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >