如何获得“待办事项”列表?我是新来的网络垃圾,我不知道如何循环每一页,以获得所有'事情要做'?告诉我哪里做错了?任何帮助都将得到高度重视。提前谢谢。在
import requests
import re
from bs4 import BeautifulSoup
from urllib.request import urlopen
offset = 0
url = 'https://www.tripadvisor.com/Attractions-g255057-Activities-oa' + str(offset) + '-Canberra_Australian_Capital_Territory-Hotels.html#ATTRACTION_LIST_CONTENTS'
urls = []
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
for link in soup.find_all('a', {'last'}):
page_number = link.get('data-page-number')
last_offset = int(page_number) * 30
print('last offset:', last_offset)
for offset in range(0, last_offset, 30):
print('--- page offset:', offset, '---')
url = 'https://www.tripadvisor.com/Attractions-g255057-oa' + str(offset) + '-Canberra_Australian_Capital_Territory-Hotels.html#ATTRACTION_LIST_CONTENTS'
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
for link in soup.find_all('a', {'property_title'}):
iurl='https://www.tripadvisor.com/Attraction_Review-g255057' + link.get('href')
print(iurl)
基本上,我想要每个'事情做'的href。 我对“要做的事情”的期望输出是:
^{pr2}$就像在下面的例子中,我使用这个代码来获取堪培拉市每个餐厅的href 我的餐厅准则是:
import requests
import re
from bs4 import BeautifulSoup
from urllib.request import urlopen
with requests.Session() as session:
for offset in range(0, 1050, 30):
url = 'https://www.tripadvisor.com/Restaurants-g255057-oa{0}-Canberra_Australian_Capital_Territory.html#EATERY_LIST_CONTENTS'.format(offset)
soup = BeautifulSoup(session.get(url).content, "html.parser")
for link in soup.select('a.property_title'):
iurl = 'https://www.tripadvisor.com/' + link.get('href')
print(iurl)
餐厅代码的输出为:
https://www.tripadvisor.com/Restaurant_Review-g255057-d1054676-Reviews-Lanterne_Rooms-Canberra_Australian_Capital_Territory.html
https://www.tripadvisor.com/Restaurant_Review-g255057-d755055-Reviews-Courgette_Restaurant-Canberra_Australian_Capital_Territory.html
https://www.tripadvisor.com/Restaurant_Review-g255057-d6893178-Reviews-Pomegranate-Canberra_Australian_Capital_Territory.html
https://www.tripadvisor.com/Restaurant_Review-g255057-d7262443-Reviews-Les_Bistronomes-Canberra_Australian_Capital_Territory.html
.
.
.
.
好吧,这并不难,你只需要知道要使用哪些标签。
让我用这个例子来解释:
总共有8页和212个链接(每页30个,最后2个)。
我希望这能把事情弄清楚一点
相关问题 更多 >
编程相关推荐