如何使用Python从HTML中获取特定字符串

import re import urllib web = "http://pic.haibao.com/piclist/2271" page = urllib.urlopen(web) html = page.read() pic_pat =r'src=\("http:\/\/.*?.jpg)' impat = re.compile(keypat) keylist = impat.findall(html)

function getList(screen_index) { var boxes = []; var screen2 = "<li class=\"piclistli\"><div class=\"pic200\"><a href=\"http:\/\/pic.haibao.com\/pic\/12027963.htm\"><img width=\"310\" height=\"465\" src=\"http:\/\/cdn2.hbimg.cn\/store\/tuku\/310_999\/piccommon\/1218\/12188\/D5259EFE8B9999E8FA968CBD38.jpg\" alt=\"\u200b1\u6708\u7684\u7ebd\u7ea6\u4f9d\u7136\u51b7\u51bd\uff0c\u4f46\u578b\u4eba\u4eec\u5e76\u6ca1\u6709\u5929\u6c14\u7684\u6076\u52a3\u800c\u968f\u4fbf\u5957\u4ef6\u8863\u670d\u5c31\u51fa\u95e8\u3002\u5373\u4fbf\u662f\u904d\u5730\u79ef\u96ea\uff0c\u8fd8\u662f\u8981\u7a7f\u4e0a\u6709\u578b\u7684\u5927\u8863\u548c\u9774\u5b50\uff1b\u5929\u6c14\u7070\u6697\u65f6\uff0c\u8fd8\u662f\u8981\u7a7f\u4e0a\u9753\u4e3d\u7684\u8272\u5f69\u6210\u4e3a\u8857\u5934\u660e\u4eae\u7684\u98ce\u666f\u3002\u62a5\u53cb\u4eec\u9a6c\u4e0a\u6765\u7ffb\u7ffb\u770b\u5427\uff01\" \/><\/a><\/div>

2条回答

网友

1楼 · 编辑于 2024-09-26 21:48:17

试试BeautifulSoup4

from bs4 import BeautifulSoup as bs
html_doc = bs(html)
img_list = html_doc.find_all('img')
for image in img_list:
    print image.get('src')

After change

网友

2楼 · 编辑于 2024-09-26 21:48:17

改用urllib2，这是一个非常酷的从网页抓取数据的库。你知道吗

import urllib2
from lxml import html
url = "Sample url"

html_code = urllib2.urlopen(url)
parsed_source = html.fromstring(html_code) # This will give you html source as string, on which xpath can be applied.
link = parsed_source.xpath("//a/@href")    # This code will return a list of href values on the html source, this Xpath is to be modified as per the html which is displayed in the UI.

这是一个示例代码，您应该如何处理这个问题，因为您必须编写自己的xpath来获取数据。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章