Python代码点击下一页链接,并刮取所有页面超链接

2024-06-25 05:34:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我对python还很陌生,在从一个页面移动到另一个页面时,我很惊讶,我能够抓取一个页面的详细信息。 下面是我使用的代码

def getURLinfo(url):
            url = "https://apps1.coned.com/cemyaccount/MemberPages/MyAccounts.aspx?lang=eng"
            driver.get(url)
            html = driver.page_source
            nextpage = "ctl00$Main$DataPager1$ctl01$ctl01"
            soup = BeautifulSoup(html)

            while soup.find(id=re.compile(nextpage)):
                    for table in soup.findAll('table', {'id':'ctl00_Main_lvMyAccount_itemPlaceholderContainer'} ):
                            for link in table.findAll('a'):
                                    link.findAll('a')
                                    print link['href']
                    driver.find_element_by_link_text(nextpage).click()
                    html = html + driver.page_source
                    soup = BeautifulSoup(driver.page_source)

                    soup = BeautifulSoup(html)

    driver.close()

我不确定我是否也在正确的轨道上。在

下面是html代码 视图 211538138800143 43-38 39普拉斯35 胡安·门多萨 活跃的 删除 在

^{pr2}$

    </td>
</tr> 

<tr align="center"><td>
    <span id="ctl00_Main_DataPager1"><a disabled="disabled"><< </a>&nbsp;<span>1</span>&nbsp;<a href="javascript:__doPostBack('ctl00$Main$DataPager1$ctl01$ctl01','')">2</a>&nbsp;<a href="javascript:__doPostBack('ctl00$Main$DataPager1$ctl01$ctl02','')">3</a>&nbsp;<a href="javascript:__doPostBack('ctl00$Main$DataPager1$ctl01$ctl03','')">4</a>&nbsp;<a href="javascript:__doPostBack('ctl00$Main$DataPager1$ctl01$ctl04','')">5</a>&nbsp;&nbsp;<a href="javascript:__doPostBack('ctl00$Main$DataPager1$ctl01$ctl05','')">...</a>&nbsp;<a href="javascript:__doPostBack('ctl00$Main$DataPager1$ctl02$ctl00','')"> >></a>&nbsp;</span> 

</td></tr>

</table>


Tags: mainhtmldrivertablelink页面javascripthref