我需要创建一个新字符串,内容是一个websi的标题

2024-10-02 16:29:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我原以为像(1)这样的东西会起作用,但它抛出了一个错误。有什么想法或建议吗?你知道吗

(一)

versionPreCheck = lxml.html.parse("URL")
versionCheck = versionPreCheck.find(".//title").text

LatestVersion = (versionCheck.read())

错误:

Traceback (most recent call last):
  File "python", line 132, in <module>
  File "src/lxml/etree.pyx", line 3426, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1839, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1865, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1769, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1162, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 600, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 710, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 637, in lxml.etree._raiseParseError
OSError: Error reading file 'bazorkversion--grify.repl.co': failed to load external entity "bazorkversion--grify.repl.co"

标题如下:

https://bazorkversion--grify.repl.co/ 标题是字符串“PreAlpha 3” (它出现在浏览器选项卡的顶部,站点的favicon旁边)


Tags: insrcparserparse错误linerepllxml
1条回答
网友
1楼 · 发布于 2024-10-02 16:29:01

You aren't the only one receiving this error,它可能是lxml中的一个故障。你知道吗

相反,您可以尝试使用另一个web抓取模块,如BeautifulSoup,以及requests模块来接收来自URL的请求:

>>> import requests
>>> from bs4 import BeautifulSoup as BS
>>> r = requests.get('https://bazorkversion grify.repl.co/')
>>> soup = BS(r.text, 'lxml')
>>> soup.title.text
'PreAlpha 3'

相关问题 更多 >