获取最后一个页码网页抓取

<div id="paging-wrapper-btm" class="paging-wrapper"> <ol class="page-nos"><li ><span class="selected">1</span></li><li ><a href='http://www.asos.de/Herren-Jeans/podlh/?cid=4208&pge=1&pgesize=36&sort=-1'>2</a></li><li ><a href='http://www.asos.de/Herren-Jeans/podlh/?cid=4208&pge=2&pgesize=36&sort=-1'>3</a></li><li ><a href='http://www.asos.de/Herren-Jeans/podlh/?cid=4208&pge=3&pgesize=36&sort=-1'>4</a></li><li ><a href='http://www.asos.de/Herren-Jeans/podlh/?cid=4208&pge=4&pgesize=36&sort=-1'>5</a></li><li #LIVALUES#>...</li><li ><a href='http://www.asos.de/Herren-Jeans/podlh/?cid=4208&pge=28&pgesize=36&sort=-1'>29</a></li><li class="page-skip"><a href='http://www.asos.de/Herren-Jeans/podlh/?cid=4208&pge=1&pgesize=36&sort=-1'>Weiter »</a></li></ol>

3条回答

网友

1楼 · 编辑于 2024-10-05 12:19:57

试试这个：

ols = soup.find_all('ol')
list_of_as = [ol.find_all('a') for ol in ols] # Finds all a's inside each ol in the ols list
all_as = []
for a in list_of_as: # This is to expand each sublist of a's and put all of them in one list
 all_as.extend(a)
print all_as

网友

2楼 · 编辑于 2024-10-05 12:19:57

啊。。我找到了一个简单的解决办法。在

for item in soup.select("ol a"):
    x = item.text
    print x

然后我可以排序并选择最大的数字。在

网友

3楼 · 编辑于 2024-10-05 12:19:57

以下内容将提取最后一页的编号：

from bs4 import BeautifulSoup 
import requests


html = requests.get("http://www.asos.de/Herren-Jeans/podlh/?cid=4208&via=top&r=2#parentID=-1&pge=1&pgeSize=36&sort=-1")
soup = BeautifulSoup(html.text)

ol = soup.find('ol', class_='page-nos')
pages = [li.text for li in ol.find_all('li')]
last_page = pages[-2]

print last_page

您的网站将显示：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章