使用Beautifulsoup下载python中的图像

2024-09-29 00:11:44 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从伊朗网站下载一张照片,把代码放在culab中,得到timeout error和URLerror

    from bs4 import BeautifulSoup
    import urllib.request
    
    def make_soup(url):
      thepage = urllib.request.urlopen(url)
      #req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})
      #thepage = urlopen(req).read()
      soupdata = BeautifulSoup(thepage, "html.parser")
      return soupdata
    
    i=1
    soup = make_soup("https://www.banikhodro.com/car/pride/")
    for img in soup.find_all('img'):
      temp = img.get('src')
      #print(temp)
      if temp[0]=="/":
          image = "https://www.banikhodro.com/car/pride/"+temp
      else:
          image = temp
      #print(image)    
      nametemp = img.get('alt')
      nametemp = str(nametemp)
      if len(nametemp)== 0:
          i=i+1
      else:
          filename=nametemp
          
      imagefile = open(filename+ ".jpeg", 'wb')
      imagefile.write(urllib.request.urlopen(image).read())
      imagefile.close()
TimeoutError                              Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/urllib3/connection.py in _new_conn(self)
    158             conn = connection.create_connection(
--> 159                 (self._dns_host, self.port), self.timeout, **extra_kw)
    160 

15 frames

TimeoutError: [Errno 110] Connection timed out


During handling of the above exception, another exception occurred:

NewConnectionError                        Traceback (most recent call last)

NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f079e4cdcf8>: Failed to establish a new connection: [Errno 110] Connection timed out


During handling of the above exception, another exception occurred:

MaxRetryError                             Traceback (most recent call last)

MaxRetryError: HTTPSConnectionPool(host='www.banikhodro.com', port=443): Max retries exceeded with url: /car/pride/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f079e4cdcf8>: Failed to establish a new connection: [Errno 110] Connection timed out',))


During handling of the above exception, another exception occurred:

ConnectionError                           Traceback (most recent call last)

/usr/local/lib/python3.6/dist-packages/requests/adapters.py in send(self, request, stream, timeout, verify, cert, proxies)
    514                 raise SSLError(e, request=request)
    515 
--> 516             raise ConnectionError(e, request=request)
    517 
    518         except ClosedPoolError as e:

ConnectionError: HTTPSConnectionPool(host='www.banikhodro.com', port=443): Max retries exceeded with url: /car/pride/ (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f079e4cdcf8>: Failed to establish a new connection: [Errno 110] Connection timed out',))

添加超时错误和连接错误。当在colab中使用伊朗Websait处理降层图像时,这些错误在GoogelColab中提供给我 提前感谢那些回答我问题的人


Tags: imageselfcomurlimgrequestwwwexception
2条回答

一种方法是:

import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.banikhodro.com/car/pride/").content
soup = BeautifulSoup(page, "html5lib").find_all("span", {"class": "photo"})
images = [
    f"https://www.banikhodro.com{img.find('img')['src']}" for img in soup
    if "Adv" in img.find("img")["src"]
]
for image in images:
    print(f"Fetching {image}")
    with open(image.rsplit("/")[-1], "wb") as img:
        img.write(requests.get(image).content)

这会将汽车优惠的所有非通用图像提取到本地文件夹中

183093_1-m.jpg
183098_1-m.jpg
183194_1-m.jpg
183208_1-m.jpg
183209_1-m.jpg
183272_1-m.jpg
183279_1-m.jpg
183286_1-m.jpg
183384_1-m.jpg
import requests
from bs4 import BeautifulSoup

page = requests.get("https://www.banikhodro.com/car/pride/").content
soup = BeautifulSoup(page, "html5lib")
images = [
    f"https://www.banikhodro.com{img['src']}" for img in soup.find_all('img')
    # sort it accordingly based on class or id inside find_all method
]
for image in images:
    print(f"Fetching {image}")
    with open(image.split("/")[-1], "wb") as img:
        img.write(requests.get(image).content)
  1. pip安装请求#安装最首选的请求模块
  2. 这段代码将给出各种图像,包括页脚等
  3. 您可以在find_all方法中对这些图像数据进行排序,该方法有一个名为attrs的参数 有关更多信息,请参阅:click here

相关问题 更多 >