Python从HTTPS aspx下载图像问题的回答

Python从HTTPS aspx下载图像

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我正在尝试从NASS Case Viewer下载一些图像。一个例子是 <ul> <li><a href="https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?xsl=main.xsl&CaseID=149006692" rel="nofollow noreferrer">https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?xsl=main.xsl&CaseID=149006692</a></li> </ul> 本例的图像查看器的链接是 <ul> <li><a href="https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?ImageView&ImageID=497001669&Desc=FRONT&Title=Vehicle+1+-+Front&Version=1&Extend=jpg" rel="nofollow noreferrer">https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?ImageView&ImageID=497001669&Desc=FRONT&Title=Vehicle+1+-+Front&Version=1&Extend=jpg</a></li> </ul> 我想这可能是不可见的，因为https。然而，这只是正面第二幅图像。在 图像的实际链接是（或者应该是？）在 <ul> <li><a href="https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1" rel="nofollow noreferrer">https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=497001669&CaseID=149006692&Version=1</a></li> </ul> 这将简单地下载aspx二进制文件。在 我的问题是我不知道如何将这些二进制文件存储到正确的jpg文件中。在 我尝试过的代码示例是 <pre><code>import requests test_image = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&amp;ImageID=497001669&amp;CaseID=149006692&amp;Version=1" pull_image = requests.get(test_image) with open("test_image.jpg", "wb+") as myfile: myfile.write(str.encode(pull_image.text)) </code></pre> 但在jpg文件中不正确。我还检查了<code>pull_image.raw.read()</code>，发现它是空的。在 有什么问题吗？我的网址不合适吗？我使用beauthoulsoup将这些url放在一起，并通过检查几个页面中的HTML代码来查看它们。在 我是否错误地保存了二进制文件？在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我想跟进@t.m.adam的回答，为那些有兴趣在自己的项目中使用这些数据的人提供一个完整的答案。在 下面是我的代码，用于提取案例ID示例的所有图像。这是一个相当不干净的代码，但我认为它提供了您开始可能需要的东西。在 <pre><code>import requests from bs4 import BeautifulSoup from tqdm import tqdm CaseIDs = [149006673, 149006651, 149006672, 149006673, 149006692, 149006693] url_part1 = 'https://www-nass.nhtsa.dot.gov/nass/cds/' data = [] with requests.Session() as sesh: for caseid in tqdm(CaseIDs): url_full = f"https://www-nass.nhtsa.dot.gov/nass/cds/CaseForm.aspx?ViewText&CaseID={caseid}&xsl=textonly.xsl&websrc=true" #print(url_full) source = sesh.get(url_full).text soup = BeautifulSoup(source, 'lxml') tr_tags = soup.find_all('tr', style="page-break-after: always") for tag in tr_tags: #print(tag) """ try: vehicle = [x for x in tag.text.split('\n') if 'Vehicle' in x][0] ## return the first element except IndexError: vehicle = [x for x in tag.text.split('\n') if 'Scene' in x][0] ## return the first element """ tag_list = tag.find_all('tr', class_ = 'label') test = [x.find('td').text for x in tag_list] #print(test) img_id, img_type, part_name = test img_id = img_id.replace(":", "") img = tag.find('img') #part_name = img.get('alt').replace(":", "").replace("/", "") part_name = part_name.replace(":", "").replace("/", "") image_name = " ".join([img_type, part_name, img_id]) + ".jpg" url_src = img.get('src') img_url = url_part1 + url_src print(img_url) pull_image = sesh.get(img_url, stream=True) with open(image_name, "wb+") as myfile: myfile.write(pull_image.content) </code></pre>

Python从HTTPS aspx下载图像

1 个回答

相关Python问题