编码丢失时拆分字符串

from google.appengine.ext import webapp from google.appengine.ext.webapp import util from google.appengine.api import urlfetch import BeautifulSoup class MainHandler(webapp.RequestHandler): def get(self): url = 'http://ascodevida.com/ultimos' result = urlfetch.fetch(url=url) # ADVS de esta página. res = BeautifulSoup.BeautifulSoup(result.content).findAll('div', {'class' : 'box story'}) ADVList = [] for i in res: story = i.find('a', {'class' : 'advlink'}).string link = i.find('a', {'class' : 'advlink'})['href'] ADVData = { 'adv' : story, 'link' : link } ADVList.append(ADVData) self.response.headers['Content-Type'] = 'text/html; charset=UTF-8' self.response.out.write(ADVList)

2条回答

网友

1楼 · 编辑于 2024-10-03 00:18:13

我认为您是直接打印列表，它调用repr，默认输出为十六进制格式（如\xe1）。在

你可以试试这个：

>>> s = u"Leer más"
>>> repr(s)
"'Leer m\\xc3\\xa1s'"

但是print语句将尝试解码字符串：

>>> print s
Leer más

如果您希望得到正确的结果，只需避免默认的list行为，并自行处理每一项。在

网友

2楼 · 编辑于 2024-10-03 00:18:13

我是一名java开发人员，使用jsoup进行HTML解析。我在python上找到了类似的方法。这可以帮助您节省时间。在

http://www.crummy.com/software/BeautifulSoup/

大脑食物： Python regular expression for HTML parsing (BeautifulSoup)

相关问题更多 >

编程相关推荐

热门问题

热门文章