<p>好的,基本上你还没有把所有必需的<code>POST</code>参数发送到<code>HOST</code>,正如你在<code>Print-Screen</code>中看到的那样,有多个带值的参数</p>
<p>现在,我们将发出<code>GET</code>请求来解析<code>HTML</code>并获取所有必需的值,然后发出<code>POST</code>请求</p>
<pre class="lang-py prettyprint-override"><code>import requests
from bs4 import BeautifulSoup
data = {
'__EVENTTARGET': '',
'__EVENTARGUMENT': '',
'__VIEWSTATEENCRYPTED': '',
'site-search-head': '',
'ph_pagebody_0$phheader_0$_FlyoutLogin$PersonalEmail$EmailAddress': '',
'ph_pagebody_0$phheader_0$_FlyoutLogin$PersonalPassword$SingleLine_CtrlHolderDivShown': '',
'ph_pagebody_0$phheader_0$_FlyoutLogin$OrganisationEmail$EmailAddress': '',
'ph_pagebody_0$phheader_0$_FlyoutLogin$OrganisationPassword$SingleLine_CtrlHolderDivShown': '',
'ph_pagebody_0$phheader_0$_FlyoutLogin$PartnerEmail$EmailAddress': '',
'ph_pagebody_0$phheader_0$_FlyoutLogin$PartnerPassword$SingleLine_CtrlHolderDivShown': '',
'ph_pagebody_0$phthreecolumnmaincontent_1$panel$VehicleSearch$vehicle-type': 'car/truck',
'ph_pagebody_0$phthreecolumnmaincontent_1$panel$VehicleSearch$vehicle-identifier-type': 'registration+number',
'ph_pagebody_0$phthreecolumnmaincontent_1$panel$VehicleSearch$RegistrationNumberCar$RegistrationNumber_CtrlHolderDivShown': 'abc123',
'honeypot': '',
'ph_pagebody_0$phthreecolumnmaincontent_1$panel$btnSearch': 'Search'
}
def Main(url):
with requests.Session() as req:
r = req.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
data['__VIEWSTATE'] = soup.find("input", id="__VIEWSTATE").get("value")
data['__VIEWSTATEGENERATOR'] = soup.find(
"input", id="__VIEWSTATEGENERATOR").get("value")
r = req.post(url, data=data)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.findAll("div", class_="display"))
Main("https://www.vicroads.vic.gov.au/registration/buy-sell-or-transfer-a-vehicle/check-vehicle-registration/vehicle-registration-enquiry")
</code></pre>
<p>现在,如果你检查了输出,你会看到它是空的,这是由于两件事</p>
<ol>
<li>在<code>HTML</code>源中有一个名为<code>monsido</code>的值,该值与<code>JS</code>一起用于生成<code>one-time</code>令牌,以便在会话期间对请求进行身份验证</李>
</ol>
<pre class="lang-html prettyprint-override"><code><script type="text/javascript">
var _monsido = _monsido || [];
_monsido.push(['_setDomainToken', 'dfWhFzGbaTj5hyKQYZxi0g']);
_monsido.push(['_withStatistics', 'true']);
</script>
<script src="//cdn.monsido.com/tool/javascripts/monsido.js"></script>
<script>
</code></pre>
<ol start=“2”>
<li><code>HOST</code>受<code>CloudFlare</code>保护,其中它也需要<code>Cookie</code>中的<code>__cfduid</code>参数</李>
</ol>
<p>现在,为了缩短道路,如果您使用当前的<code>cookies/headers</code>在<code>requests.Session()</code>下调用monsido,您将获得所需的令牌。所以你现在需要得到<code>__cfduid</code>,我帮不了你,因为绕过已知的防火墙是非法的,比如防火墙,它实际上是为了防止这种刮擦的情况而发明的</p>
<p>现在,回到<code>selenium</code>,您可以获得所需的输出:</p>
<pre class="lang-py prettyprint-override"><code>from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import pandas as pd
options = Options()
options.add_argument(' headless')
driver = webdriver.Firefox(options=options)
driver.get("https://www.vicroads.vic.gov.au/registration/buy-sell-or-transfer-a-vehicle/check-vehicle-registration/vehicle-registration-enquiry")
regnum = driver.find_element_by_css_selector(
"input#ph_pagebody_0_phthreecolumnmaincontent_1_panel_VehicleSearch_RegistrationNumberCar_RegistrationNumber_CtrlHolderDivShown").send_keys("abc123")
click = driver.find_element_by_css_selector(
"input#ph_pagebody_0_phthreecolumnmaincontent_1_panel_btnSearch").click()
names = [
item.text for item in driver.find_elements_by_css_selector("label.label")]
data = [item.text for item in driver.find_elements_by_css_selector(
"div.display")[:10]]
df = pd.DataFrame([data], columns=names)
df.to_csv("data.csv", index=False)
driver.quit()
</code></pre>
<p>输出:<a href="http://www.sharecsv.com/s/c72048bc2dd9ce8b9f323325bec4b193/data.csv" rel="nofollow noreferrer">view-online</a></p>
<p><a href="https://i.stack.imgur.com/4tQZc.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/4tQZc.png" alt="enter image description here"/></a></p>