Python从HTTPS aspx下载图像

import requests test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1" pull_image = requests.get(test_image) with open("test_image.jpg", "wb+") as myfile: myfile.write(str.encode(pull_image.text))

2条回答

网友
1楼 · 编辑于 2024-10-06 12:44:45

我想跟进@t.m.adam的回答，为那些有兴趣在自己的项目中使用这些数据的人提供一个完整的答案。在
下面是我的代码，用于提取案例ID示例的所有图像。这是一个相当不干净的代码，但我认为它提供了您开始可能需要的东西。在
import requests from bs4 import BeautifulSoup from tqdm import tqdm CaseIDs = [149006673, 149006651, 149006672, 149006673, 149006692, 149006693] url_part1 = 'https://www-nass.nhtsa.dot.gov/nass/cds/' data = [] with requests.Session() as sesh: for caseid in tqdm(CaseIDs): url_full = f"https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?ViewText&CaseID={caseid}&xsl=textonly.xsl&websrc=true" #print(url_full) source = sesh.get(url_full).text soup = BeautifulSoup(source, 'lxml') tr_tags = soup.find_all('tr', style="page-break-after: always") for tag in tr_tags: #print(tag) """ try: vehicle = [x for x in tag.text.split('\n') if 'Vehicle' in x][0] ## return the first element except IndexError: vehicle = [x for x in tag.text.split('\n') if 'Scene' in x][0] ## return the first element """ tag_list = tag.find_all('tr', class_ = 'label') test = [x.find('td').text for x in tag_list] #print(test) img_id, img_type, part_name = test img_id = img_id.replace(":", "") img = tag.find('img') #part_name = img.get('alt').replace(":", "").replace("/", "") part_name = part_name.replace(":", "").replace("/", "") image_name = " ".join([img_type, part_name, img_id]) + ".jpg" url_src = img.get('src') img_url = url_part1 + url_src print(img_url) pull_image = sesh.get(img_url, stream=True) with open(image_name, "wb+") as myfile: myfile.write(pull_image.content)

网友
2楼 · 编辑于 2024-10-06 12:44:45

.text将响应内容解码为字符串，因此您的imge文件将损坏。
相反，您应该使用保存二进制响应内容的^{}。在
import requests test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1" pull_image = requests.get(test_image) with open("test_image.jpg", "wb+") as myfile: myfile.write(pull_image.content)
.raw.read()也返回字节，但是为了使用它，必须将stream参数设置为True。在
^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章