从html页面分析并获取链接

网友

1楼 · 编辑于 2024-05-17 06:34:29

为什么不简单地使用enumerate()：

site=urllib2.urlopen(r'http://www.rom.on.ca/en/join-us/jobs')

for i,j in enumerate(site):
     if "http://www.ontario.ca" in j: #j is the line
         print i+1 #i is the number start from 0 normally in the html code is 1 the first line so add +1

>>620

网友

2楼 · 编辑于 2024-05-17 06:34:29

在您的代码问题上，这将逐字读取。如果不传递要读取的数据量。在

for line in data.read():

你可以：

^{pr2}$

这部分不完全是一个答案，但我建议您使用BeautifulSoup。在

import urllib2
from BeautifulSoup import BeautifulSoup
url = "http://www.my_url.com"
data = urllib2.urlopen(url).read()
soup = BeautifulSoup.BeautifulSoup(data)

all_links = soup.find('a')
# you can look for specific link

网友

3楼 · 编辑于 2024-05-17 06:34:29

一般情况下，您需要Xpath来实现这些目的。示例：http://www.w3schools.com/xpath/xpath_examples.asp

Python有一个漂亮的库lxml： http://lxml.de/xpathxslt.html

相关问题更多 >

编程相关推荐

热门问题

热门文章

从html页面分析并获取链接

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >