回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我正在使用requests和BeautifulSoup从房地产网站上搜集数据。它有几个编号的“页面”,显示了几十个公寓。我编写了一个循环,在所有这些页面上运行,并从单元中收集数据,但不幸的是,它们使用javascript,因此,代码只返回第一个页面的单元。我也尝试了硒元素,但遇到了同样的问题</p>
<p>非常感谢您的建议</p>
<p>代码如下:</p>
<pre><code># Create empty lists to append data scraped from URL
# Number of lists depends on the number of features you want to extract
lista_preco = []
lista_endereco = []
lista_tamanho = []
lista_quartos = []
lista_banheiros = []
lista_vagas = []
lista_condominio = []
lista_amenidades = []
lista_fotos = []
lista_sites = []
n_pages = 0
for page in range(1, 15):
n_pages += 1
url = "https://www.vivareal.com.br/venda/bahia/salvador/apartamento_residencial/"+'?pagina='+str(page)
url = requests.get(url)
soup = BeautifulSoup(url.content, 'html.parser')
house_containers = soup.find_all('div', {'class' :'js-card-selector'})
if house_containers != []:
for container in house_containers:
# Price
price = container.find_all('section', class_='property-card__values')[0].text
try:
price = int(price[:price.find('C')].replace('R$', '').replace('.','').strip())
except:
price = 0
lista_preco.append(price)
# Zone
location = container.find_all('span', class_='property-card__address')[0].text
location = location.strip()
lista_endereco.append(location)
# Size
size = container.find_all('span', class_='property-card__detail-value js-property-card-value property-card__detail-area js-property-card-detail-area')[0].text
if '-' not in size:
size = int(size[:size.find('m')].replace(',','').strip())
else:
size = int(size[:size.find('-')].replace(',','').strip())
lista_tamanho.append(size)
# Rooms
quartos = container.find_all('li', class_='property-card__detail-item property-card__detail-room js-property-detail-rooms')[0].text
quartos = quartos[:quartos.find('Q')].strip()
if '-' in quartos:
quartos = quartos[:quartos.find('-')].strip()
lista_quartos.append(int(quartos))
# Bathrooms
banheiros = container.find_all('li', class_='property-card__detail-item property-card__detail-bathroom js-property-detail-bathroom')[0].text
banheiros = banheiros[:banheiros.find('B')].strip()
if '-' in banheiros:
banheiros = banheiros[:banheiros.find('-')].strip()
lista_banheiros.append(int(banheiros))
# Garage
vagas = container.find_all('li', class_='property-card__detail-item property-card__detail-garage js-property-detail-garages')[0].text
vagas = vagas[:vagas.find('V')].strip()
if '--' in vagas:
vagas = '0'
lista_vagas.append(int(vagas))
# Condomínio
condominio = container.find_all('section', class_='property-card__values')[0].text
try:
condominio = int(condominio[condominio.rfind('R$'):].replace('R$','').replace('.','').strip())
except:
condominio = 0
lista_condominio.append(condominio)
# Amenidades
try:
amenidades = container.find_all('ul', class_='property-card__amenities')[0].text
amenidades = amenidades.split()
except:
amenidades = 'Zero'
lista_amenidades.append(amenidades)
# url
link = 'https://www.vivareal.com.br/' + container.find_all('a')[0].get('href')[1:-1]
lista_sites.append(link)
# image
#p = str(container.find_all('img')[0])
#p
#2x size thumbnail
#imgurl = p[p.find('https'):p.rfind('data-src')]
#imgurl.replace('"', '').strip()
#lista_fotos.append(imgurl)
else:
break
time.sleep(randint(1,2))
print('You scraped {} pages containing {} properties.'.format(n_pages, len(lista_preco)))```
</code></pre>