'Python理解重音像^ 'çº'

2024-06-26 11:03:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在创建一个Python脚本,基本上这部分我遇到了问题,它只需要一个网页的帖子标题。 Python不懂口音,我已经尝试了我所知道的一切 1-将此代码放在第一行#-*-编码:utf-8-*- 2-输入编码(“utf-8”)

代码:

# -*- coding: utf-8 -*- 
import re
import requests

def opena(url):
    headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
    lexdan1 = requests.get(url,headers=headers)
    lexdan2 = lexdan1.text
    lexdan1.close
    return lexdan2
dan = []
a = opena('http://www.megafilmesonlinehd.com/filmes-lancamentos')
d = re.compile('<strong class="tt-filme">(.+?)</strong>').findall(a)
for name in d:
    name =  name.encode("utf-8")
    dan.append(name)
print dan

这就是我得到的:

['Porta dos Fundos: Contrato Vital\xc3\xadcio HD 720p', 'Os 28 Homens de Panfilov Legendado HD', 'Estrelas Al\xc3\xa9m do Tempo Dublado', 'A Volta do Ju\xc3\xadzo Final Dublado Full HD 1080p', 'The Love Witch Legendado HD', 'Manchester \xc3\x80 Beira-Mar Legendado', 'Semana do P\xc3\xa2nico Dublado HD 720p', 'At\xc3\xa9 o \xc3\x9altimo Homem Legendado HD 720p', 'Arbor Demon Legendado HD 720p', 'Esquadr\xc3\xa3o de Elite Dublado Full HD 1080p', 'Ouija Origem do Mal Dublado Full HD 1080p', 'As Muitas Mulheres da Minha Vida Dublado HD 720p', 'Um Novo Desafio para Callan e sua Equipe Dublado Full HD 1080p', 'Terror Herdado Dublado DVDrip', 'Officer Downe Legendado HD', 'N\xc3\xa3o Bata Duas Vezes Legendado HD', 'Eu, Daniel Blake Legendado HD', 'Sangue Pela Gl\xc3\xb3ria Legendado', 'Quase 18 Legendado HD 720p', 'As Aventuras de Robinson Cruso\xc3\xa9 Dublado Full HD 1080p', 'Indigna\xc3\xa7\xc3\xa3o Dublado HD 720p']

Tags: 代码name编码dedofullutfheaders
2条回答

打印list时,表示其中的内容(调用__repr__方法),而不打印(调用__str__方法):

class test():
    def __repr__(self):
        print '__repr__'
        return ''
    def __str__(self):
        print '__str__'
        return ''

将为您提供:

>>> a = [test()]
>>> a
[__repr__
]
>>> print a
[__repr__
]
>>> print a[0]
__str__

字符串的__repr__方法不转换特殊字符(甚至不转换\t\n)。你知道吗

因为您告诉解释器打印一个列表,所以解释器调用list类的__str__方法。当您调用容器的__str__方法时,它对每个包含的对象使用__repr__方法(在本例中是-str类型)。str类型的__repr__方法不转换unicode字符,但它的__str__方法(打印单个str对象时调用该方法)会转换unicode字符。你知道吗

这里有一个很好的问题来解释两者的区别: Difference between __str__ and __repr__ in Python

如果单独打印每个字符串,应该会得到所需的结果。你知道吗

import re
import requests

def opena(url):
    headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64)AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.99 Safari/537.36'}
    lexdan1 = requests.get(url,headers=headers)
    lexdan2 = lexdan1.text
    lexdan1.close
    return lexdan2

dan = []
a = opena('http://www.megafilmesonlinehd.com/filmes-lancamentos')
d = re.compile('<strong class="tt-filme">(.+?)</strong>').findall(a)
for name in d:
    dan.append(name)
for item in dan:
     print item

相关问题 更多 >