我想从给定的网站Lin下载许多文件扩展名相同的Wget或Python文件

2024-09-29 16:23:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从以下Microsoft飞行模拟器AI流量网站下载相同文件类型的.utu和.zip文件:-

http://web.archive.org/web/20050315112710/http://www.projectai.com:80/libraries/acfiles.php?cat=6*(当前重绘)

http://web.archive.org/web/20050315112940/http://www.projectai.com:80/libraries/acfiles.php?cat=1(复古画)

在每一个页面上都有针对AI飞机类型的空客波音等的子标签,当您单击飞机图像时,会显示repaits.zip文件选项。在

然后,当您单击下载时,文件夹名称将变为http://web.archive.org/web/20041114195147/http://www.projectai.com:80/libraries/repaints.php?ac=number&cat=(number)重绘.php?变成下载.php?文件ID=(4位数字)

一次下载所有.zip文件需要输入什么?因为点击它们单独下载需要很长时间。在

我还想下载所有的.utu文件扩展文件,为航班1终极交通人工智能飞机重绘。从以下网页:

http://web.archive.org/web/20060512161232/http://ultimatetraffic.flight1.net:80/utfiles.asp?mode=1&index=0

然后当你点击下载终极交通飞机纹理:-最后的文件夹路径变成/utfiles.asp?mode=download&id=F1AIRepaintNumbers编号-数字.utu我想和其他网站一样做。在

我使用了在Youtube上的一段视频中找到的python2.79编写的代码,插入我的信息来实现我的目标,但是当我运行超时和错误等时,它不起作用,可能是因为它很简单:

import requests

from bs4 import BeautifulSoup

import wget

def download_links(url):

source_code = requests.get(url)

plain_text = source_code.text

soup = BeautifulSoup(plain_text, "html.parser")

for link in soup.findAll('a'):

href = link.get('href')

print(href)

wget.download(href)

download_links('http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/acfiles.php?cat=6')

Tags: 文件orgcomwebhttplibrariesdownloadwww
1条回答
网友
1楼 · 发布于 2024-09-29 16:23:38

更新:尝试此更新,现在应该从第一页的所有链接下载所有zip文件:

from bs4 import BeautifulSoup
import requests, zipfile, io

def get_zips(zips_page):
    # print(zips_page)
    zips_source = requests.get(zips_page).text
    zip_soup = BeautifulSoup(zips_source, "html.parser")
    for zip_file in zip_soup.select("a[href*=download.php?fileid=]"):
        zip_url = link_root + zip_file['href']
        print('downloading', zip_file.text, '...',)
        r = requests.get(zip_url)
        with open(zip_file.text, 'wb') as zipFile:
            zipFile.write(r.content)


def download_links(root, cat):
    url = ''.join([root, cat])
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")

    for zips_suffix in soup.select("a[href*=repaints.php?ac=]"):
        # get_zips(root, zips_suffix['href'])
        next_page = ''.join([root, zips_suffix['href']])
        get_zips(next_page)


link_root = 'http://web.archive.org/web/20041225023002/http://www.projectai.com:80/libraries/'

category = 'acfiles.php?cat=6'
download_links(link_root, category)

相关问题 更多 >

    热门问题