我试图用Python列出一个站点的所有页面,以便用BeautifulSoup进行抓取。我现在拥有的是:
team_urls = ['http://www.lyricsfreak.com/e/ed+sheeran/thinking+out+loud_21083784.html',
'http://www.lyricsfreak.com/e/ed+sheeran/photograph_21058341.html',
'http://www.lyricsfreak.com/e/ed+sheeran/a+team_20983411.html',
'http://www.lyricsfreak.com/e/ed+sheeran/i+see+fire_21071421.html',
'http://www.lyricsfreak.com/e/ed+sheeran/perfect_21113253.html',
'http://www.lyricsfreak.com/e/ed+sheeran/castle+on+the+hill_21112527.html',
'http://www.lyricsfreak.com/e/ed+sheeran/supermarket+flowers_21113249.html',
'http://www.lyricsfreak.com/e/ed+sheeran/lego+house_20983415.html',
'http://www.lyricsfreak.com/e/ed+sheeran/even+my+dad+does+sometimes_21085123.html',
'http://www.lyricsfreak.com/e/ed+sheeran/kiss+me_20983414.html',
'http://www.lyricsfreak.com/e/ed+sheeran/shape+of+you_21113143.html',
'http://www.lyricsfreak.com/e/ed+sheeran/i+see+fire_21071421.html'
]
我想调用一个函数来拉取所有以http://www.lyricsfreak.com/e/ed+sheeran/
开头的站点,因为我知道当前的列表是草率的,还有大约30个可用站点,我不希望只是手动添加。在
在Python 2.x中,可以创建子域列表,如下所示:
这将创建一个
^{pr2}$urls
列表,开始:在Python 3.x中,可以按如下方式修改:
或者使用
requests
库,如下所示:安装使用:
相关问题 更多 >
编程相关推荐