利用Beautfiul-Soup动态更新值提取当前投标金额

2024-05-20 19:22:33 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我第一次介绍Python和BeautifulSoup。我试图从一个流行的拍卖网站(RealInsight)上列出的一个特定的财产刮当前出价金额,但我不能得到BeautifulSoup拉我要找的实际整数,只有HTML代码。我在寻找“s-b-n”类标签的价值,在拍卖真正开始前是3250000美元。你知道吗

https://marketplace.realinsight.com/sales/details/XXX

我认为这是因为该值是动态更新的,并且是在HTML代码之外生成的,但是我不确定如何验证该论点,或者如果证明是正确的,如何获得该值。我还认为我可能没有正确地引用包含该值的表,但同样地,我对python或bs4不是很熟悉。你知道吗

[使用ewink的方法更新了下面的最终代码-每秒刮一次,持续5秒]-更新为处理拍卖结束-

import bs4
import time
import csv
import datetime
import sys
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'https://marketplace.realinsight.com/sales/details/XXX'
uclient = uReq(my_url)
page_html = uclient.read()
uclient.close()
endmsg = "Auction End"
page_soup = soup(page_html, "html.parser")
propname = page_soup.title.text
bids = page_soup.select_one(".body-content")
currentbid = bids['data-nb']
bidincrement = bids['data-bi']
with open(propname + '_bids.csv','w',newline='') as f:
    thewriter = csv.writer(f)
    thewriter.writerow(['i','prop_name','date_time','bid_increment','bid_amt'])
    for i in range(0,5,1):
            try:
                import sys
                sale = page_soup.select_one("div.sale-end-text")
                auctionend = sale.text.replace(" ", "")
                if auctionend == sale.text.replace(" ", ""):
                    currentDT = datetime.datetime.now()    
                    thewriter.writerow([i,endmsg,currentDT,currentbid])    
                    print(endmsg,currentbid)
                    time.sleep(1)
                    sys.exit()
                else:
                    print('will never get to this point')
            except Exception:
                pass

            currentDT = datetime.datetime.now()
            thewriter.writerow([i,propname,currentDT,bidincrement,currentbid])
            print(i,propname,currentDT,bidincrement,currentbid)
            time.sleep(1)

用chitown88的方法更新

import bs4
import datetime
import time
import csv
import selenium
from selenium import webdriver
driver = webdriver.Chrome(executable_path='C:\\Users\\XXXX\\Downloads\\chromedriver_win32\\chromedriver.exe')
driver.get('https://marketplace.realinsight.com/sales/details/XXX')
html = driver.page_source
page_soup = bs4.BeautifulSoup(html,"html.parser")
bids = page_soup.select("td.s-b-n")
propname = page_soup.title.text
currentbid = bids[0].text
with open(propname + '_bids.csv','w',newline='') as f:
    thewriter = csv.writer(f)
    thewriter.writerow(['i','prop_name','date_time','bid_amt'])
    for i in range(0, 5, 1):
        currentDT = datetime.datetime.now()
        driver.refresh()
        thewriter.writerow([i, propname, currentDT, currentbid])
        print(i, propname, currentDT, currentbid)
        time.sleep(1)
driver.close()

我可以在HTML代码中看到我要查找的数字($3250000),但它每隔几秒钟就会闪烁和更新一次,这就是为什么我认为它是在其他地方生成的。你知道吗

任何指导都将不胜感激。你知道吗


Tags: csvtextimportdatetimetimehtmlpagesoup
3条回答

我无法让BeautifulSoup给我数据,但我通过Selenium成功了。您必须安装chromedriver和Selenium,您可以从控制台键入:

pip install selenium

以下是脚本:

from selenium import webdriver
from selenium.webdriver.common.by import By

pageLink = 'https://marketplace.realinsight.com/sales/details/367'

# Setup our chrome preferences.
chromeOptions = webdriver.ChromeOptions()
# Change this variable to the path of the chromedriver you downloaded.
chromedriver = "D:\Downloads\chromedriver_win32\chromedriver.exe"

driver = webdriver.Chrome( executable_path = chromedriver, chrome_options = chromeOptions )

driver.get( pageLink )

extractData = driver.find_element( By.XPATH, "/html/body/div[3]/section[2]/div/div[1]/div[2]/div[1]/div[2]/div/div[1]/div/table/tbody/tr[2]/td[2]" )

print( extractData.text )

您需要在解析之前加载页面。Selenium是一个完美的选择。你知道吗

import bs4 
from selenium import webdriver 

driver = webdriver.Chrome()
driver.get('https://marketplace.realinsight.com/sales/details/367')

html = driver.page_source
page_soup = bs4.BeautifulSoup(html,"html.parser")

bids = page_soup.select("td.s-b-n")
bid = bids[0].text
print(bid)

driver.close()

以及输出:

In [91]: print(bid)
$3,250,000

您可以使用BeautifulSoup在div.body-content中有data-sb属性来存储bid值。你知道吗

page_soup = soup(page_html, "html.parser")
bids = page_soup.select_one(".body-content")

print(bids['data-sb'])
# format the number
print('${:,d}'.format(int(float(bids['data-sb']))))
print(bids.attrs)

相关问题 更多 >