如何将URL从BeautifulSoup结果存储到列表，然后存储到选项卡

from bs4 import BeautifulSoup import requests import re source=requests.get('https://www.zonaprop.com.ar/locales-comerciales-alquiler-palermo-hollywood-0-ambientes-publicado-hace-menos-de-1-mes.html').text soup=BeautifulSoup(source,'lxml') #Extract URL link_text = '' URL=[] PlacesDf = pd.DataFrame(columns=['Address', 'Location.lat', 'Location.lon']) for a in soup.find_all('a', attrs={'href': re.compile("/propiedades/")}): link_text = a['href'] URL='https://www.zonaprop.com.ar'+link_text print(URL)

https://www.zonaprop.com.ar/propiedades/local-en-alquiler-soler-6000-palermo-hollywood-a-44227001.html#map https://www.zonaprop.com.ar/propiedades/local-en-alquiler-soler-6000-palermo-hollywood-a-44227001.html https://www.zonaprop.com.ar/propiedades/local-en-alquiler-soler-6000-palermo-hollywood-a-44227001.html https://www.zonaprop.com.ar/propiedades/excelente-esquina-en-alquiler-s-lote-propio-con-43776599.html https://www.zonaprop.com.ar/propiedades/excelente-esquina-en-alquiler-s-lote-propio-con-43776599.html https://www.zonaprop.com.ar/propiedades/excelente-esquina-en-alquiler-s-lote-propio-con-43776599.html https://www.zonaprop.com.ar/propiedades/excelente-local-en-alquiler-palermo-hollywood-fitz-44505027.html#map https://www.zonaprop.com.ar/propiedades/excelente-local-en-alquiler-palermo-hollywood-fitz-44505027.html https://www.zonaprop.com.ar/propiedades/excelente-local-en-alquiler-palermo-hollywood-fitz-44505027.html https://www.zonaprop.com.ar/propiedades/local-palermo-hollywood-44550855.html#map https://www.zonaprop.com.ar/propiedades/local-palermo-hollywood-44550855.html https://www.zonaprop.com.ar/propiedades/local-palermo-hollywood-44550855.html https://www.zonaprop.com.ar/propiedades/local-comercial-o-edificio-corporativo-oficinas-500-43164952.html https://www.zonaprop.com.ar/propiedades/local-comercial-o-edificio-corporativo-oficinas-500-43164952.html https://www.zonaprop.com.ar/propiedades/local-comercial-o-edificio-corporativo-oficinas-500-43164952.html https://www.zonaprop.com.ar/propiedades/local-palermo-viejo-44622843.html#map https://www.zonaprop.com.ar/propiedades/local-palermo-viejo-44622843.html https://www.zonaprop.com.ar/propiedades/local-palermo-viejo-44622843.html https://www.zonaprop.com.ar/propiedades/alquiler-de-local-comercial-en-palermo-hollywood-44571635.html#map https://www.zonaprop.com.ar/propiedades/alquiler-de-local-comercial-en-palermo-hollywood-44571635.html https://www.zonaprop.com.ar/propiedades/alquiler-de-local-comercial-en-palermo-hollywood-44571635.html

Adresses=['https://www.zonaprop.com.ar/propiedades/local-en-alquiler-soler-6000-palermo-hollywood-a-44227001.html#map',' https://www.zonaprop.com.ar/propiedades/local-en-alquiler-soler-6000-palermo-hollywood-a-44227001.html',...]

2条回答

网友

1楼 · 编辑于 2024-09-30 01:33:29

我不知道你是从哪里得到拉特和伦，我正在做一个关于地址的假设。我可以看到你有很多重复在你目前的网址返回。我建议下面的css选择器只针对列表链接。这些类选择器比您当前的方法快得多。你知道吗

使用返回的链接列表的len来定义行维度，您已经拥有了列。你知道吗

from bs4 import BeautifulSoup as bs
import requests
import pandas as pd
import re

r = requests.get('https://www.zonaprop.com.ar/locales-comerciales-alquiler-palermo-hollywood-0-ambientes-publicado-hace-menos-de-1-mes.html')
soup = bs(r.content, 'lxml') #'html.parser'
links = ['https://www.zonaprop.com.ar' + item['href'] for item in soup.select('.aviso-data-title a')]
locations = [re.sub('\n|\t','',item.text).strip() for item in soup.select('.aviso-data-location')]
df = pd.DataFrame(index=range(len(links)),columns= ['Address', 'Lat', 'Lon', 'Link'])
df.Link = links
df.Address = locations
print(df)

网友

2楼 · 编辑于 2024-09-30 01:33:29

您应该执行以下操作：

from bs4 import BeautifulSoup
import requests
import re
import pandas as pd


source=requests.get('https://www.zonaprop.com.ar/locales-comerciales-alquiler-palermo-hollywood-0-ambientes-publicado-hace-menos-de-1-mes.html').text
soup=BeautifulSoup(source,'lxml')

#Extract URL
all_url = [] 
link_text = ''
PlacesDf = pd.DataFrame(columns=['Address', 'Location.lat', 'Location.lon'])
for a in soup.find_all('a', attrs={'href': re.compile("/propiedades/")}):
  link_text = a['href']
  URL='https://www.zonaprop.com.ar'+link_text
  print(URL)
  all_url.append(URL)

df = pd.DataFrame({"URLs":all_url}) #replace "URLs" with your desired column name

希望这有帮助

相关问题更多 >

编程相关推荐

热门问题

热门文章