回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我有我的刮板工作,它从网站上的所有9页正确的数据。但是,我有一个问题,我认为我目前使用的方法并不理想(如果有一个页码大于我输入的范围,那么这些结果就会丢失)。在</p>
<p>我的代码如下:</p>
<pre><code>import requests
import time
import csv
import sys
from bs4 import BeautifulSoup
houses = []
url = "https://www.propertypal.com/property-to-rent/newtownabbey/"
page=requests.get(url)
soup=BeautifulSoup(page.text,"lxml")
g_data = soup.findAll("div", {"class": "propbox-details"})
for item in g_data:
try:
title = item.find_all("span", {"class": "propbox-addr"})[0].text
except:
pass
try:
town = item.find_all("span", {"class": "propbox-town"})[0].text
except:
pass
try:
price = item.find_all("span", {"class": "price-value"})[0].text
except:
pass
try:
period = item.find_all("span", {"class": "price-period"})[0].text
except:
pass
course=[title,town,price,period]
houses.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(course)
for i in range(1,15):
time.sleep(2)#delay time requests are sent so we don't get kicked by server
url2 = "https://www.propertypal.com/property-to-rent/newtownabbey/page-{0}".format(i)
page2=requests.get(url2)
print(url2)
soup=BeautifulSoup(page2.text,"lxml")
g_data = soup.findAll("div", {"class": "propbox-details"})
for item in g_data:
try:
title = item.find_all("span", {"class": "propbox-addr"})[0].text
except:
pass
try:
town = item.find_all("span", {"class": "propbox-town"})[0].text
except:
pass
try:
price = item.find_all("span", {"class": "price-value"})[0].text
except:
pass
try:
period = item.find_all("span", {"class": "price-period"})[0].text
except:
pass
course=[title,town,price,period]
houses.append(course)
with open ('newtownabbeyrentalproperties.csv','w') as file:
writer=csv.writer(file)
writer.writerow(['Address','Town', 'Price', 'Period'])
for row in houses:
writer.writerow(row)
</code></pre>
<p>你可以从我使用的代码中看到</p>
^{pr2}$
<p>将数字1到14添加到&page=参数中。在</p>
<p>这是不理想的,如果网站有一个额外的页面数,如第15,16,17页,那么铲运机将错过这些页面上的数据,因为它将只看到第14页的数据最多。在</p>
<p>有人能给我一个帮助吗?我可以使用分页来查找网页上要刮取的页数,或者是一个更好的方法来设置这个for循环?在</p>
<p>非常感谢。在</p>