我想刮多个特定的链接页。例如,我希望能够选择哪个链接后面有特定数量的迭代。从初始输入刮取的结果必须附加到用户输入或替换。我有:
#url = raw_input('Enter - ')
url = 'http://www.columbia.edu/kermit/k95.html'
itr = raw_input('Enter iteration: ')
i = int(itr)
n = raw_input('Enter Number: ')
n = int(n)
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
tags = soup('a')
print 'Link:' , url
while i > 0:
i = i - 1
if i == 0:
break
for tag in tags:
me = tag.get('href', None)
#Just to make sure the link/content match print tag.contents[0]
link = tags[(n - 1)]
#print link
links = link.get('href', None)
print 'Link:', links
Enter - http://www.columbia.edu/~fdc/
Enter count: 4
Enter Position: 9
Link: http://www.columbia.edu/~fdc/
Link: http://www.columbia.edu/kermit/k95.html
Link: http://www.columbia.edu/kermit/k95.html (Should be k95faq.html)
Link: http://www.columbia.edu/kermit/k95.html (Should be ckfaq.html)
我得到了我想要的迭代次数和特定的链接,但是我需要第一个url(用户输入)替换为每个迭代变量“links”下的链接。你知道吗
例如,用户输入一个类似http://www.columbia.edu/~fdc/的url,并在页面上重复4次第9个链接。第一次迭代将http://www.columbia.edu/kermit/k95.html作为“链接”返回。我想第二次迭代给我的第9个链接“链接”,这应该是k95常见问题.html你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐