在python中创建文件夹

<a href="http://www.youversion.com/bible/gen.45.nmv-fas">http://www.youversion.com/bible/gen.45.nmv-fas</a> <a href="http://www.youversion.com/bible/gen.46.nmv-fas">http://www.youversion.com/bible/gen.46.nmv-fas</a> <a href="http://www.youversion.com/bible/gen.47.nmv-fas">http://www.youversion.com/bible/gen.47.nmv-fas</a> <a href="http://www.youversion.com/bible/gen.48.nmv-fas">http://www.youversion.com/bible/gen.48.nmv-fas</a> <a href="http://www.youversion.com/bible/gen.49.nmv-fas">http://www.youversion.com/bible/gen.49.nmv-fas</a> <a href="http://www.youversion.com/bible/gen.50.nmv-fas">http://www.youversion.com/bible/gen.50.nmv-fas</a> <a href="http://www.youversion.com/bible/exod.1.nmv-fas">http://www.youversion.com/bible/exod.1.nmv-fas</a> <a href="http://www.youversion.com/bible/exod.2.nmv-fas">http://www.youversion.com/bible/exod.2.nmv-fas</a> <a href="http://www.youversion.com/bible/exod.3.nmv-fas">http://www.youversion.com/bible/exod.3.nmv-fas</a>

import lxml.html as html import urllib import urlparse from BeautifulSoup import BeautifulSoup import re root = html.parse(open('all.html')) for link in root.findall('//a'): url = link.get('href') name = urlparse.urlparse(url).path.split('/')[-1] f = urllib.urlopen(url) s = f.read() f.close() soup = BeautifulSoup(s) articleTag = soup.html.body.article converted = str(articleTag) open(name, 'w').write(converted)

1条回答

网友

1楼 · 发布于 2024-05-17 11:35:11

您可以使用lxml模块解析文件外的链接，然后使用urllib下载每个链接。阅读链接可能如下所示：

import lxml.html as html

root = html.parse(open('links.html'))
for link in root.findall('//a'):
  url = link.get('href')

您可以使用urllib.urlopen下载指向文件的链接：

^{pr2}$

把这些放在一起，你应该有一些和你想要的相似的东西。在

相关问题更多 >

编程相关推荐

热门问题

热门文章