刮取不同的值（cookies？）在同一个u

r = requests.get(url) soup = BeautifulSoup(r.content,'html.parser') headers = soup.find_all("h1",{"class":"X"}) for header in headers: headerText = header.text match=re.search('(.+ Hotels)',headerText) if match: writeHotels(soup,match.group(0)) def writeHotels(soup,location): #create Hotels directory hotelDir = 'Hotels/' if not os.path.exists(hotelDir): os.makedirs(hotelDir) hotels = soup.find_all("a",{"class":"Y"}) name=location+'.txt' #write hotels to file if os.path.exists(hotelDir+name): print 'opening file '+name+"\n" else: print 'creating file '+name+"\n" file=open(hotelDir+name,'a') for hotel in hotels: file.write(hotel.text+"\n") file.close()

1条回答

网友

1楼 · 发布于 2024-10-02 04:24:23

如果您在页面源代码中查看页面底部的页码，那么每个页面都有一个唯一的url。如果你把汤打印出来，你会发现你可以抓取这个网址。如果有很多页面，它不会显示所有页面，只是一个。。。对于中间页。但是，您可以从第一个值和最后一个值计算URL（我在下面没有这样做）。以下是我使用的代码：

url = "http://www.tripadvisor.com/Hotels-g60713-San_Francisco_California-Hotels.html" 
page=urllib.request.urlopen(url)

soup = BeautifulSoup(page.read())
#print(soup)
for myValue3 in soup.findAll("a",attrs={ "class" : "pageNum" }):
    try:
        print("the value of page " + myValue3.get("data-page-number") + " is: " + myValue3.get("href").split("#ACCOM_OVERVIEW")[0])
    except:
        print("error")

这是输出

^{pr2}$

注意url中的-oa###-。这是可以改变的，你可以得到所有的后续页面。在

相关问题更多 >

编程相关推荐

热门问题

热门文章