在Python中解码字节(HTML)时缺少代码(requests、BeautifulSoup、urllib)

2024-10-01 02:33:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我对Python非常陌生,我正在尝试让web页面的源代码与HTML元素一起工作。在

但是,当我把字节转换成utf-8时,一些HTML代码就消失了。这是我的代码:

import urllib.request

req = urllib.request.Request('http://avast.softonic.com/')
response = urllib.request.urlopen(req)
the_page = response.read()

例如,在“theu page”中ID为“review_data”的DIV的内容是:

^{pr2}$

但当我尝试做以下任何事情时:

import urllib.request

req = urllib.request.Request('http://avast.softonic.com/')
response = urllib.request.urlopen(req)
the_page = response.read()
html_missing_elements = the_page.decode('utf-8')

或者:

import requests

r =requests.get('http://avast.softonic.com/')
html_missing_elements = r.text

或者:

import urllib.request
from bs4 import BeautifulSoup

req = urllib.request.Request('http://avast.softonic.com/')
response = urllib.request.urlopen(req)
the_page = response.read()
html_missing_elements = BeautifulSoup(the_page)

在下面的示例中,ID为“review_data”的DIV只包含:

<div id="review_data" class="track_links"><br /><!--[/conclusion]--></p></div>

我不能得到完整的原始HTML网页代码,有代码丢失,我想知道为什么。在

谢谢。在


Tags: the代码importcomhttpresponserequesthtml
1条回答
网友
1楼 · 发布于 2024-10-01 02:33:01

html中嵌入了一些回车符,即\r

\r<br /><! [/lead] ></p>\r
>\r<p>A big plus point for Avast Free Antivirus 2016

还有更多。在

删除后,IDE中的所有内容都将正常工作,打印时可以看到标记内容:

^{pr2}$

数据实际上在那里,您的IDE只是没有显示它,因为回车:

 soup = BeautifulSoup(r.content,"lxml")
 print(soup.select_one("#review_data"))

使用pycharm将输出:

<div class="track_links" id="review_data">
<br/><! [/conclusion] ></p>
</div>

但是使用:

 print(soup.select_one("#review_data").text)

将输出:

\nConnoisseurs of free antivirus solutions will already know of Avast Free Antivirus 2016 and have probably installed it at some point or another. This software is one of the leaders in its field, providing a robust suite of defences against viruses and malware, as well as some other useful tools that you might not expect. Better still, Avast is one of the less intrusive antivirus programs- perhaps less so in recent years, but still a lot less system-hogging than the big two.\r Brimming with features A big plus point for Avast Free Antivirus 2016 is its suite of features. Although these features have caused its install size to increase (up to 2GB hard drive space is recommended!), it shouldn’t prove an issue for most modern hard drives and you do get a lot of tools for free. Aside from the standard antivirus scanning, which is kept sharp with constant updates, the latest version of Avast has home network security which detects vulnerabilities for all devices connected to your network. The latest version, the ‘Nitro’ update, also adds a dedicated Avast browser called SafeZone. Heralded as the world’s safest browser, this could equally be argued as bloatware and a great free feature. For those who are security conscious, especially regarding banking, it should be seen as beneficial. The in-built ad blocker can be a godsend when visiting certain sites. Another new feature is CyberCapture, which quarantines any suspicious incoming files. Victims of viruses will know the importance of this buffer.\r A simple and effective interface Avast has changed a few times over the years and the Nitro update is no different, but thankfully their design approach seems to have remained constant. The program is simple and straightforward to use, with bold buttons and clear text in friendly colours. Avast Free Antivirus 2016 will sit in the system tray until needed, like most antivirus software, then expand when opened into a small borderless window that looks sleek matching the Windows 10 design scheme. Most sections of this are easy enough to follow, with a large set of buttons for the tools and standard icons like a cog for accessing settings. Of course, you’re also never far away from a premium upgrade button, encouraging you to download and pay for Avast Premier. However, this is not forced upon you. Each of the main features of Avast has its own section, such as internet security, the SafeZone browser and Smart Scan, so you really can’t go wrong.\r The best things in life are free For a free program, Avast is pretty impressive. Yes, it has lost some of its independent feel as the years have gone by, but that’s a small price for a great bit of free software. Avast Free Antivirus 2016 will interfere with your everyday browsing less than the bigger names in software. It’s very simple to use, therefore remains one of the top free solutions.\r\n'

如果要使用ipython运行相同的代码,则只需使用soup = BeautifulSoup(r.content,"lxml")即可看到正确的输出:

In [5]: soup = BeautifulSoup(r.content,"lxml")

In [6]: soup.select_one("#review_data")
Out[6]: 
<div class="track_links" id="review_data">
<p><! [lead] >Connoisseurs of free antivirus solutions will already know of Avast Free Antivirus 2016 and have probably installed it at some point or another. This software is one of the leaders in its field, providing a <strong>robust suite of defences against viruses and malware</strong>, as well as some other useful tools that you might not expect. Better still, Avast is one of the less intrusive antivirus `
<br/><! [/lead] ></p> <p><! [features] ><! [subfeatures] ></p><h3>Brimming with features</h3><! [/subfeatures] > <p>A big plus point for Avast Free Antivirus 2016 is its suite of features. Although these features have caused its install size to increase (up to 2GB hard drive space is recommended!), it shouldn’t prove an issue for most modern hard drives and you do get a lot of tools for free.</p> <p>Aside from the standard antivirus scanning, which is kept sharp with constant updates, the latest version of Avast has <strong>home network security</strong> which detects vulnerabilities for all devices connected to your network.</p> <p>The latest version, the ‘Nitro’ update, also adds a dedicated Avast browser called <strong>SafeZone</strong>. Heralded as the world’s safest browser, this could equally be argued as bloatware and a great free feature. For those who are security conscious, especially regarding banking, it should be seen as beneficial. The in-built ad blocker can be a godsend when visiting certain sites. Another new feature is <strong>CyberCapture</strong>, which quarantines any suspicious incoming files. Victims of viruses will know the importance of this buffer.
<br/><! [/features] ></p> <p><! [usability] ><! [subusability] ></p><h3>A simple and effective interface</h3><! [/subusability] > <p>Avast has changed a few times over the years and the <strong>Nitro update</strong> is no different, but thankfully their design approach seems to have remained constant. The program is <strong>simple and straightforward</strong> to use, with bold buttons and clear text in friendly colours.</p> <p>Avast Free Antivirus 2016 will sit in the system tray until needed, like most antivirus software, then expand when opened into a small borderless window that looks sleek matching the Windows 10 design scheme. Most sections of this are easy enough to follow, with a large set of buttons for the tools and standard icons like a cog for accessing settings.</p> <p>Of course, you’re also never far away from a premium upgrade button, encouraging you to download and pay for <a href="http://avast-premier-antivirus.en.softonic.com" title="Avast Premier">Avast Premier</a>. However, this is not forced upon you.</p> <p>Each of the main features of Avast has its own section, such as <strong>internet security</strong>, the SafeZone browser and <strong>Smart Scan</strong>, so you really can’t go wrong.
<br/><! [/usability] ></p> <p><! [conclusion] ><! [subconclusion] ></p><h3>The best things in life are free</h3><! [/subconclusion] > <p>For a free program, Avast is pretty impressive. Yes, it has lost some of its independent feel as the years have gone by, but that’s a small price for a great bit of free software. Avast Free Antivirus 2016 will interfere with your everyday browsing less than the bigger names in software. It’s very simple to use, therefore remains <strong>one of the top free solutions</strong>.
<br/><! [/conclusion] ></p>
</div>

它与编码无关,它只是回车符在您运行代码的任何地方干扰输出。运行下面的一个简单示例,您可以看到如何影响输出:

In [14]: s = "foo\bar"

In [15]: print(s)
foar

相关问题 更多 >