beauthulsoup开始标记格式不正确？

2024-09-29 17:13:37 发布

您现在位置：Python中文网/ 问答频道 /正文

7844

网友

男 | 程序猿一只，喜欢编程写python代码。

我正在尝试将一个wordpressxml转换成Octopress，使用BeautifulSoup进行迁移。在

运行exitwp时，得到以下输出：

writing......................................................Traceback (most recent call last):


File "exitwp.py", line 293, in <module>
    write_jekyll(data, target_format)
  File "exitwp.py", line 284, in write_jekyll
    out.write(html2fmt(i['body'], target_format))
  File "exitwp.py", line 45, in html2fmt
    return html2text(html, '')
  File "/Users/kevinquillen/Documents/workspace/exitwp2/html2text.py", line 700, in html2text
    return optwrap(html2text_file(html, None, baseurl))
  File "/Users/kevinquillen/Documents/workspace/exitwp2/html2text.py", line 695, in html2text_file
    h.feed(html)
  File "/Users/kevinquillen/Documents/workspace/exitwp2/html2text.py", line 285, in feed
    HTMLParser.HTMLParser.feed(self, data)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 108, in feed
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 148, in goahead
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 229, in parse_starttag
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 304, in check_for_whole_start_tag
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/HTMLParser.py", line 115, in error
HTMLParser.HTMLParseError: malformed start tag, at line 1, column 64

我尝试过使用Beautifulsoup3.2.0和3.0.7a，但没有多少运气。在

我还尝试导出Posts上的不同日期范围，但是在第1行仍然得到相同的错误，列号改变了。在

我唯一能想到的是一些旧的帖子里有adsense代码，但除此之外，我怎么能轻易地追踪到它被帖子内容扼杀的地方呢？在

OSX 10.7上的Python 2.7版

编辑：也发生在没有错误标记的页面转储（只有2个页面）上。在

更新：它似乎不喜欢锚定标记。标签如下所示，内容非常基本的链接。删除它们，它编译正确。为什么它不喜欢这个HTML？删除它们会导致编译没有错误。在

^{pr2}$

Tags： in py lib feed line library framework versions

1条回答

网友

1楼 · 发布于 2024-09-29 17:13:37

像这样修改你的代码（在html2中文本.py)公司名称：

try:
    HTMLParser.HTMLParser.feed(self, data)
except:
    print 'malformed data: %r' % data
    raise

我想你会明白，“数据”包含了一些奇怪的东西。如果没有，请在您的问题中添加数据。在

beauthulsoup开始标记格式不正确？

相关问题更多 >

编程相关推荐

热门问题

热门文章

beauthulsoup开始标记格式不正确？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >