最后10分钟代码板.org使用简易Html解析器（EHP）python发布

import requests from ehp import Html def catch_refs(data): html = Html() dom = html.feed(data) return [ind.attr['href'] for ind in dom.find('a') if 'view' in ind.text()] def retrieve_source(refs, dir): """ Get the source code of the posts then save in a dir. """ pass if __name__ == '__main__': req = requests.get('http://codepad.org/recent') refs = catch_refs(req.text) retrieve_source(refs, '/tmp/') print refs

1条回答

网友

1楼 · 发布于 2024-10-02 00:37:57

实际上你的retrieve_source(refs, dir)什么都不做。你知道吗

所以你没有得到任何结果。你知道吗

根据您的评论更新：

import os


def get_code_snippet(page):
    dom = Html().feed(page)
    # getting all <div class=='highlight'>
    elements = [el for el in dom.find('div')
                if el.attr['class'] == 'highlight']
    return elements[1].text()

def retrieve_source(refs, dir):
    for i, ref in enumerate(refs):
        with open(os.path.join(dir, str(i) + '.html'), 'w') as r:
            r.write(get_code_snippet(requests.get(ref).content))

相关问题更多 >

编程相关推荐

热门问题

热门文章