所以让我们假设我试图获得一个特定图像的链接,比如:
from bs4 import BeautfiulSoup
import urlparse
soup = BeautifulSoup("http://examplesite.com")
for image in soup.findAll("img"):
srcd = urlparse.urlparse(src)
path = srcd.path # gets the path
fn = os.path.basename(path) # gets filename
# lets say the webpage i was scraping had their images like this:
# <img src="../..someimage.jpg" />
有没有什么简单的方法可以从中获取完整的url?还是必须使用正则表达式?在
使用
urlparse.urljoin
:相关问题 更多 >
编程相关推荐