<p>你已经清楚地观察到有<code>3 divisions</code>、<code>Top Window</code>和<code>2 frames</code>,因此我们可以得到<code>Top Window</code>的<code>page source</code>,然后穿过<code>2 frames</code>来刮取<code>page source</code>,如下所示:</p>
<pre><code>from selenium import webdriver
driver = webdriver.Ie(r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get(r'https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?xsl=main.xsl&CaseID=773013618')
content = driver.page_source
print("Content on Top Window is :")
print(content)
multiple_frames = driver.find_elements_by_xpath('//iframe')
print("There are " +str(len(multiple_frames)) +" frames")
for frame_name in multiple_frames:
print("Content on "+frame_name.get_attribute("name")+" frame is : ")
driver.switch_to.frame(frame_name)
sub_content = driver.page_source
print(sub_content)
driver.switch_to.default_content()
driver.quit()
</code></pre>
<p>控制台上的输出是:</p>
<pre><code>Content on Top Window is :
<html xmlns:saxon="http://saxon.sf.net/" xmlns="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:dot="http://www.volpe.dot.gov" xml:lang="en"><head>
<title>NASS Case Viewer - CaseID:773013618</title>
<link id="StyleOut" type="text/css" rel="stylesheet" title="output" href="StyleOut.css" /><script src="main.js"></script></head>
<body onload="javascript:init('True','/NASS/CDS/XSLT/','773013618','case.xsl','CaseForm','Crash')">
...
...
...
</body></html>
There are 2 frames
Content on menu frame is :
<html xmlns:svg="http://www.w3.org/2000/svg" xmlns:msxsl="urn:schemas-microsoft-com:xslt" xmlns:dot="http://www.volpe.dot.gov" xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"><head>
<meta http-equiv="Content-Script-Type" />
<title>menu</title>
...
...
...
</script></head></html>
Content on viewer frame is :
<html xmlns="http://www.w3.org/1999/xhtml" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:svg="http://www.w3.org/2000/svg-20000303-stylable" xmlns:fn="http://www.w3.org/2005/02/xpath-functions"><head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>Case</title>
<link id="StyleOut" type="text/css" rel="stylesheet" title="output" href="StyleOut.css" />
</head>
<body id="bodyMain">
...
...
...
</body></html>
</code></pre>