使用Python解析xhtml页面时出现问题

2024-10-01 09:39:22 发布

您现在位置:Python中文网/ 问答频道 /正文

你好,我试图用python解析xhtml中的页面,但收到以下错误:

**xml.parsers.expat.ExpatError: unbound prefix: line 6, column 0**

[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] mod_wsgi (pid=9156): Exception occurred processing WSGI script '/home/hidura/webapps/karinapp/Suite/Gate.py'.
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] Traceback (most recent call last):
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/home/hidura/webapps/karinapp/Suite/Gate.py", line 32, in application
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     response = assistant(buildReq.extrctEnv(environ, location))#Here the assistant takes the parameters and begins the work
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/home/hidura/webapps/karinapp/Suite/wsgi/Utilities/Assistant/Assistant.py", line 114, in __init__
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     self.websearch()#Finding the web.
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/home/hidura/webapps/karinapp/Suite/wsgi/Utilities/Assistant/Assistant.py", line 364, in websearch
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     websource = self.manage.string2parse(result[0][1])#Transforming the web page into a tree.
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/home/hidura/webapps/karinapp/Suite/wsgi/Writer/tagsmanip.py", line 56, in string2parse
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     self.doc = parseString(newData)
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/usr/local/lib/python3.1/xml/dom/minidom.py", line 1937, in parseString
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     return expatbuilder.parseString(string)
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/usr/local/lib/python3.1/xml/dom/expatbuilder.py", line 940, in parseString
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     return builder.parseString(string)
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]   File "/usr/local/lib/python3.1/xml/dom/expatbuilder.py", line 223, in parseString
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1]     parser.Parse(string, True)
[Fri Mar 25 09:58:21 2011] [error] [client 127.0.0.1] xml.parsers.expat.ExpatError: unbound prefix: line 6, column 0

这是页面的代码:

^{pr2}$

提前谢谢!在


Tags: inpyclienthomelineerrorxmlmar
2条回答

我认为问题是http://www.facebook.com/2008/fbml是一个找不到的页面

问题是expat使用fb作为名称空间前缀,但该标记是FB:LOGIN-BUTTON。Expat将FB视为未绑定。XHTML规范指出,由于XML,所有的HTML元素和属性must be lowercase都区分大小写。在

我尝试了使用lxml XML parser的文档,它自动将前缀转换为小写。也许您可以切换到另一个解析器:

import lxml.etree
data = open('fb.xhtml', 'rb').read()
tree = lxml.etree.fromstring(data)
ns_map = {'fb': 'http://www.facebook.com/2008/fbml'}
print tree.xpath('.//fb:LOGIN-BUTTON', namespaces=ns_map)

输出:

^{pr2}$

相关问题 更多 >