我有一个网址,我应该得到一个网页中的所有链接。我用的是漂亮的汤。你知道吗
from bottle import route, run
import urllib2
from mechanize import Browser
from BeautifulSoup import BeautifulSoup
from urlparse import urlparse
import json
import sys
import csv
import re
@route('/hello')
def hello():
text=list();
link=list();
req = urllib2.Request("http://www.amazon.com",
headers={"Content-Type": "application/json"})
html=urllib2.urlopen(req).read()
soup = BeautifulSoup(html)
last_page = soup.find('div', id="nav_subcats")
for elm in last_page.findAll('a'):
texts = elm.text
links = elm.get('href')
links = links.partition("&node=")[2]
text.append(texts)
link.append(links)
alltext=[]
for i,j in zip(text,link):
alltext.append({"name":i,"id":j})
return alltext
run(host='localhost', port=8080, debug=True)
但是当它返回text
时,我得到的是AAABBBCCCDDD
,其中AAA
、BBB
、CCC
和DDD
是不同的项。为什么我不能像这样把它放在括号里?你知道吗
["AAA", "BBB" "CCC","DDD","EEE","FFF"]
目前没有回答
相关问题 更多 >
编程相关推荐