将messedup编码类型的文件转换为usab

http://www.rechercheisidore.fr/sparql/query?query=PREFIX+dcterms%3A+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Fterms%2F%3E+PREFIX+foaf%3A+%3Chttp%3A%2F%2Fxmlns.com%2Ffoaf%2F0.1%2F%3E+SELECT+%3Furicollection+%3Ftitrecollection+%3Fdescription+%3Fadresseweb+WHERE+{+%3Furicollection+%3Fpredicat+%3Chttp%3A%2F%2Fwww.rechercheisidore.fr%2Fclass%2FCollection%3E.+%3Furicollection+dcterms%3Atitle+%3Ftitrecollection.+%3Furicollection+dcterms%3Adescription+%3Fdescription.+%3Furicollection+foaf%3Ahomepage+%3Fadresseweb.+}+ORDER+BY+ASC%28%3Ftitrecollection%29+LIMIT+300&format=application%2Frdf%2Bxml

2条回答

网友

1楼 · 编辑于 2024-09-30 18:21:57

页面的问题似乎是服务器将文本编码为UTF-8，然后将UTF-8作为拉丁语1处理，并再次用UTF-8编码。要扭转这种情况，请以UTF-8形式读入文件，将其编码为拉丁1字节字符串，然后将字节解码为UTF-8。在

网友

2楼 · 编辑于 2024-09-30 18:21:57

jwodder解决方案的佐证：

import lxml.etree as ET
import urllib2

url = "http://www.rechercheisidore.fr/sparql/query?query=PREFIX+dcterms:+<http://purl.org/dc/terms/>+PREFIX+foaf:+<http://xmlns.com/foaf/0.1/>+SELECT+?uricollection+?titrecollection+?description+?adresseweb+WHERE+{+?uricollection+?predicat+<http://www.rechercheisidore.fr/class/Collection>.+?uricollection+dcterms:title+?titrecollection.+?uricollection+dcterms:description+?description.+?uricollection+foaf:homepage+?adresseweb.+}+ORDER+BY+ASC(?titrecollection)+LIMIT+300&format=application/rdf+xml"
doc = ET.parse(urllib2.urlopen(url))

namespaces = { 'ns':'http://www.w3.org/2005/sparql-results#', }

for elt in doc.xpath('//ns:binding[@name="description"]/ns:literal',
                     namespaces=namespaces):
    text = elt.text
    if text is not None:
        text = text.encode('latin-1').decode('utf_8')
        print(text)
    break

收益率

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章