如何提高Python程序的速度？

2024-09-25 10:25:45 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在做一个网页抓取项目，我必须从19062设施获得链接。如果我使用for循环，将需要将近3个小时才能完成。我试着做了一个发电机，但没能做出任何逻辑，我不确定用发电机能不能做到。那么，是否有任何Python专家能够更快地获得我想要的东西？在我的代码中，我只对20个ID执行它。谢谢


    import requests, json
    from bs4 import BeautifulSoup as bs
    
    
    url = 'https://hilfe.diakonie.de/hilfe-vor-ort/marker-json.php?ersteller=&kategorie=0&text=& n=55.0815&e=15.0418321&s=47.270127&w=5.8662579&zoom=20000'
    res = requests.get(url).json()
    
    url_1 = 'https://hilfe.diakonie.de/hilfe-vor-ort/info-window-html.php?id='
    
    # extracting all the id= from .json res object
    id = []
    
    for item in res['items'][0]["elements"]:
        id.append(item["id"])
    
    
    # opening a .json file and making a dict for links
    file = open('links.json', 'a')
    links = {'links': []}
    
    
    def link_parser(url, id):
        resp = requests.get(url + id).content
        soup = bs(resp, "html.parser")
        link = soup.select_one('p > a').attrs['href']
        links['links'].append(link)
    
    
    # dumping the dict into links.json file
    for item in id[:20]:
        link_parser(url_1, item)
    
    json.dump(links, file)
    file.close()

Tags： from import id json parser url for link

1条回答

网友

1楼 · 发布于 2024-09-25 10:25:45

在网页抓取中，速度不是一个好主意！如果使用For循环，每秒会多次命中服务器，很可能会被阻止。发电机不会使这更快。理想情况下，您希望访问服务器一次并在本地处理数据

如果是我，我会希望使用像Scrapy这样的框架来鼓励良好实践和各种Spider classes来支持标准技术

如何提高Python程序的速度？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何提高Python程序的速度？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >