使用Python从没有表单标签但有文本输入的网站抓取数据

2024-09-30 02:19:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在编写一个python程序来从here中获取数据。我以前也有过成功,但这次对我来说是个挑战。我正在用漂亮的汤和机械化。我需要能够在文本框中输入一个邮政编码,以产生这是我所追求的结果。在

以下是包含输入文本框的代码段:

{<1分$ <div id="ContentPlaceHolder1_C001_pnlFindACenter" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ContentPlaceHolder1_C001_btnSearchClient')"> <div style="width: 400px; float: left; padding-top: 5px;"> <label for="ContentPlaceHolder1_C001_tbUserAddress" style="font-family: Arial; font-size: 13.3333px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; line-height: normal; text-decoration: none; text-transform: none; color: rgb(0, 0, 0); cursor: auto; display: inline-block; position: relative; z-index: 100; margin-right: -121px; left: 2px; top: 0px; opacity: 1;">Address, City or Zip:</label><input name="ctl00$ContentPlaceHolder1$C001$tbUserAddress" type="text" id="ContentPlaceHolder1_C001_tbUserAddress" class="textInField" style="width: 240px; background-image: url(&quot;data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAASCAYAAABSO15qAAAAAXNSR0IArs4c6QAAAPhJREFUOBHlU70KgzAQPlMhEvoQTg6OPoOjT+JWOnRqkUKHgqWP4OQbOPokTk6OTkVULNSLVc62oJmbIdzd95NcuGjX2/3YVI/Ts+t0WLE2ut5xsQ0O+90F6UxFjAI8qNcEGONia08e6MNONYwCS7EQAizLmtGUDEzTBNd1fxsYhjEBnHPQNG3KKTYV34F8ec/zwHEciOMYyrIE3/ehKAqIoggo9inGXKmFXwbyBkmSQJqmUNe15IRhCG3byphitm1/eUzDM4qR0TTNjEixGdAnSi3keS5vSk2UDKqqgizLqB4YzvassiKhGtZ/jDMtLOnHz7TE+yf8BaDZXA509yeBAAAAAElFTkSuQmCC&quot;); background-repeat: no-repeat; background-attachment: scroll; background-size: 16px 18px; background-position: 98% 50%; cursor: auto;" data-hasqtip="21" oldtitle="Address, City or Zip:" title="" autocomplete="off" aria-describedby="qtip-21"> <div id="divDistance" style="display: inline;"> &nbsp;&nbsp;within&nbsp;&nbsp; <select name="ctl00$ContentPlaceHolder1$C001$ddlRadius" id="ContentPlaceHolder1_C001_ddlRadius"> <option value="5">5</option> <option value="10">10</option> <option selected="selected" value="25">25</option> <option value="50">50</option> <option value="100">100</option> </select> miles </div> </div> <div style="width: 160px; float: left;"> &nbsp;&nbsp;&nbsp; <input type="submit" name="ctl00$ContentPlaceHolder1$C001$btnSearchClient" value="Search" onclick="GeocodeLocation();return false;WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;ctl00$ContentPlaceHolder1$C001$btnSearchClient&quot;, &quot;&quot;, false, &quot;&quot;, &quot;find-a-center&quot;, false, false))" id="ContentPlaceHolder1_C001_btnSearchClient" class="btnCenter"> </div> <div style="clear: both;"> </div> <div> <span onchange="" style="font-size:12px;display: inline;" data-hasqtip="22" oldtitle="<b>AASM SleepTM</b> is an innovative telemedicine system that brings your sleep doctor to you. Featuring a secure, web-based video platform, AASM SleepTM allows you to meet with your sleep doctor from a distance. These live video visits will save you time and money. AASM SleepTM also syncs with Fitbit sleep data and has an integrated sleep diary, enabling you and your doctor to monitor your sleep." title="" aria-describedby="qtip-22"><input id="ContentPlaceHolder1_C001_chkSleepTM" type="checkbox" name="ctl00$ContentPlaceHolder1$C001$chkSleepTM"><label for="ContentPlaceHolder1_C001_chkSleepTM">Only show AASM SleepTM capable sleep centers in my state</label></span> <a href="https://sleeptm.com/" style="font-size: 10px; margin-left: 10px; display: inline;" target="_blank" data-hasqtip="23" oldtitle="<b>AASM SleepTM</b> is an innovative telemedicine system that brings your sleep doctor to you. Featuring a secure, web-based video platform, AASM SleepTM allows you to meet with your sleep doctor from a distance. These live video visits will save you time and money. AASM SleepTM also syncs with Fitbit sleep data and has an integrated sleep diary, enabling you and your doctor to monitor your sleep." title="" aria-describedby="qtip-23">What is AASM SleepTM?</a> </div> </div>

到目前为止,这些都是我的尝试

^{pr2}$

尝试一次

first = urllib2.Request(url,
                   data=urllib.urlencode({'value': CODE}),
                   headers={'User-Agent' : 'Google Chrome'                             'Cookie': 'name = ctl00$ContentPlaceHolder1$C001$tbUserAddress'})

尝试两次

post_params = {
       'ctl00$ContentPlaceHolder1$C001$tbUserAddress': CODE
}
first = urllib.urlencode(post_params)

driver = webdriver.Chrome()
driver.get(url)
sbox = driver.find_element_by_class_name("ctl00$ContentPlaceHolder1$C001$tbUserAddress")
sbox.send_keys(CODE)
        driver.find_element_by_class_name("ctl00$ContentPlaceHolder1$C001$btnSearchClient").click()

尝试3

br = mechanize.Browser()
br.open(url)
br.select_form(name='ctl00$ContentPlaceHolder1$C001$tbUserAddress')
br['value'] = CODE
br.submit()

http = urllib2.urlopen(br.response())
soup = BeautifulSoup(http, "html5lib")

Error = "no form matching name 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'"


尝试4

soup.find('input', {'name': 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'})['value'] = CODE
soup.find('input', {'name': 'ctl00$ContentPlaceHolder1$C001$btnSearchClient'}).click()

Tags: namedivyourvaluestylesleepoptionfont
2条回答

如果我正确理解您的问题,您需要发送请求,带有特定的参数,并检查响应。 好的,让我们看看提交后发送到哪里的请求。 让我们打开邮递员。Post request params

正如我们看到的ctl00$ContentPlaceHolder1$C001$tbUserAddress获取值100,以及ctl00$ContentPlaceHolder1$T6B6681F0008$DDRADIUSctl00$ContentPlaceHolder1$C001$ddlRadiusctl00$cphTopBar$T917BC451013$rblRadius得到半径值25。在

所以让我们获取一个包含数据的小片段来发送post请求并获得所需的响应

我使用python请求

和lxml来解析html响应

我更喜欢lxml,它比beauthulsoup更难理解,但要快得多。在

import requests
from lxml import html

input_data = {
    'ctl00$cphTopBar$T917BC451013$rblRadius': 25,
    'ctl00$ContentPlaceHolder1$T6B6681F0008$ddlRadius': 25,
    'ctl00$ContentPlaceHolder1$C001$ddlRadius': 25,
    'ctl00$ContentPlaceHolder1$C001$tbUserAddress': 100
}
resp = requests.post('http://www.sleepeducation.org/find-a-facility', data=input_data)
tree = html.fromstring(resp.text)
print(tree.xpath('//div[@id="ContentPlaceHolder1_C001_map_canvas"]')[0])

我没有足够的声誉来放置文档链接,我会尝试将它们放在评论中,或者您可以通过googlepythonrequestspythonlxml 你也可以用美体素来做

^{pr2}$

这对我有用

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'http://www.sleepeducation.org/find-a-facility'
subButton = 'ContentPlaceHolder1_C001_btnSearchClient'
addyName = 'ctl00$ContentPlaceHolder1$C001$tbUserAddress'
addyId = 'ContentPlaceHolder1_C001_tbUserAddress'

def usingChromeSelenium():
    driver = webdriver.Chrome('C:\Users\documents\chromedriver.exe')
    driver.get(url)
    sleep(1)
    driver.find_element_by_name(addyName).send_keys(CODE)
    driver.find_element_by_id(subButton).click()
    sleep(1)
    html = driver.page_source
    return html

results = usingChromeSelenium()
soup = BeautifulSoup(results, "html.parser")

对于“网络驱动程序.Chrome()“您必须下载chrome.exe应用程序文件,并在括号中包含文件的路径,如果没有路径,它可能对您有效

相关问题 更多 >

    热门问题