BeautifulSoup SoupStrainer应变html和pdf链接

status, response = http.request("http://www.example.com") htmlandpdfonly=SoupStrainer('a', href=re.compile('html|pdf')) for link in BeautifulSoup(response, parseOnlyThese = htmlandpdfonly): if(link.has_key('href')): print link['href']

1条回答

网友

1楼 · 发布于 2024-10-01 17:34:35

import re
from BeautifulSoup import BeautifulSoup

# find ".html" or ".pdf" in a string
match = re.compile('\.(html|pdf)')

# parse page content
status, response = http.request("http://www.example.com")
page = BeautifulSoup(response)

# check links
for link in page.findAll('a'):
    try:
        href = link['href']
        if re.search(match, href):
            print href
    except KeyError:
        pass

编程相关推荐

C++／爪哇／C图像处理库
由于类org的许多实例而导致java内存泄漏。jboss。vfs。spi。JavaZipFileSystem
java在Android中使用CustomMultiPartEntity取消上传文件
java根据另一个JCombobox填充JCombobox值
安卓 java，如何将Unicode字符更改为普通字符？
java每次出现“CCTGG”时，我为DNA序列创建的字符串生成器都会停止
java Android NDK应用程序抛出错误未满足链接错误
用Java实现mp3音频分解
如何在源java中使用weka 3.7.12中保存的svm（wlsvm）模型类
jsf Spring@Autowired（required=true）为空

相关问题更多 >

编程相关推荐

热门问题

热门文章

BeautifulSoup SoupStrainer应变html和pdf链接

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >