使用Beautiful Soup 4解析不平衡的HTML文件

Example Domain</title>  <meta charset="utf-8" /> <meta http-equiv="Content-type" content="text/html; charset=utf-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <style type="text/css"> body { background-color: #f0f0f2; margin: 0; padding: 0; font-family: "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif; } div { width: 600px; margin: 5em auto; padding: 50px; background-color: #fff; border-radius: 1em; } a:link, a:visited { color: #38488f; text-decoration: none; } @media (max-width: 700px) { body { background-color: #fff; } div { width: auto; margin: 0 auto; border-radius: 0; padding: 1em; } } </style>

1条回答

网友

1楼 · 发布于 2024-05-07 05:12:39

使用任何高级解析器（html5lib更健壮，但速度较慢）。结果会有所不同：

soup = BeautifulSoup(open('foo.html'), 'lxml')
#<html><body><p>Example Domain   <!  <====missing tag in this line  >
#<meta charset="utf-8"/>

soup = BeautifulSoup(open('foo.html'), 'html5lib')
#<html><head></head><body>Example Domain   <!  <====missing tag in this line  >
#
#<meta charset="utf-8"/>

相关问题更多 >

编程相关推荐

热门问题

热门文章