如何在Python 3.x中逐行打印网页？

import urllib.request page = urllib.request.urlopen('http://www.york.ac.uk/teaching/cws/wws/webpage1.html', data = None) pageText = page.read() pageLines = page.readlines() print(pageLines) print(pageText)

3条回答

网友

1楼 · 编辑于 2024-06-28 06:17:51

一种方法是使用pythons请求模块。您可以通过执行pip安装请求来获得它（如果不使用virtualenv，则可能必须使用sudo）。在

import requests

res = requests.get('http://www.york.ac.uk/teaching/cws/wws/webpage1.html')
if res.status_code == 200: # check that the request went through
  # print the entire html, should maintain internal newlines so that when it print to screen it isn't on a single line
  print(res.content)

  #if you want to split the html into lines, use the split command like below
  #lines = res.content.split('\n')
  #print(lines)

网友

2楼 · 编辑于 2024-06-28 06:17:51

您的字节字符串中似乎有硬编码\n。在

例如，不能在初始值上拆分。在

In [1]: s = b'<HMTL>\n<HEAD>\n'

In [2]: s.split('\n')
                                     -
TypeError                                 Traceback (most recent call last)
<ipython-input-2-e85dffa8b351> in <module>()
  > 1 s.split('\n')

TypeError: a bytes-like object is required, not 'str'

所以，你str()它，但似乎也不起作用。在

^{pr2}$

新的台词可以，但如果你用的话。在

In [4]: str(s).split('\\n')
Out[4]: ["b'<HMTL>", '<HEAD>', "'"]

你可以使用一个原始字符串来分割

In [5]: for line in str(s).split(r'\n'):
   ...:     print(line)
   ...:
b'<HMTL>
<HEAD>
'

或者，如果您不想要前导的b，您可以decode将字节串decode分割成一个字符串对象。在

In [9]: for line in s.decode("UTF-8").split('\n'):
   ...:     print(line)
   ...:
<HMTL>
<HEAD>

网友

3楼 · 编辑于 2024-06-28 06:17:51

你得到的不是文本而是字节。如果你想要文本，就把它解码。在

b = b'<HMTL>\n<HEAD>\n<TITLE>webpage1</TITLE>\n</HEAD>\n<BODY BGCOLOR="FFFFFf" LINK="006666" ALINK="8B4513" VLINK="006666">\n'
s = b.decode()  # might need to specify an encoding
print(s)

输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章