错误- 使用BeautifulSoup4解析网页时

2024-10-01 00:31:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析网页并打印项目链接(href)。 你能帮我解决哪里出了问题吗?你知道吗

import requests
from bs4 import BeautifulSoup

link = "https://www.amazon.in/Power- 
Banks/b/ref=nav_shopall_sbc_mobcomp_powerbank?ie=UTF8&node=6612025031"

def amazon(url):
    sourcecode = requests.get(url)
    sourcecode_text = sourcecode.text
    soup = BeautifulSoup(sourcecode_text)

    for link in soup.findALL('a', {'class': 'a-link-normal aok-block a- 
text-normal'}):
        href = link.get('href')
        print(href)

amazon(link)

输出:

C:\Users\TIMAH\AppData\Local\Programs\Python\Python37\python.exe "C:/Users/TIMAH/OneDrive/study materials/Python_Test_Scripts/Self Basic/Class_Test.py" Traceback (most recent call last): File "C:/Users/TIMAH/OneDrive/study materials/Python_Test_Scripts/Self Basic/Class_Test.py", line 15, in amazon(link) File "C:/Users/TIMAH/OneDrive/study materials/Python_Test_Scripts/Self Basic/Class_Test.py", line 9, in amazon soup = BeautifulSoup(sourcecode_text, 'features="html.parser"') File "C:\Users\TIMAH\AppData\Local\Programs\Python\Python37\lib\site-packages\bs4__init__.py", line 196, in init % ",".join(features)) bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: features="html.parser". Do you need to install a parser library?

Process finished with exit code 1


Tags: textinpytestamazonlinkonedriveusers
2条回答

您可以添加标题。当你做find_all('a')的时候,你可以在那里得到它,它是href:

import requests
from bs4 import BeautifulSoup

link = "https://www.amazon.in/Power-Banks/b/ref=nav_shopall_sbc_mobcomp_powerbank?ie=UTF8&node=6612025031"

def amazon(url):
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}

    sourcecode = requests.get(url, headers=headers)
    sourcecode_text = sourcecode.text
    soup = BeautifulSoup(sourcecode_text, 'html.parser')

    for link in soup.find_all('a', href=True):
        href = link.get('href')
        print(href)

amazon(link)

代码中的问题是您使用了错误的方法名findALL。。 soup对象中没有findALL方法,因此不会为此返回任何方法。 要解决新代码使用find\u all的问题,findAll也应该起作用(小写双l)。希望这件事能让你明白。你知道吗

import requests
from bs4 import BeautifulSoup

link = "https://www.amazon.in/Power-Banks/b/ref=nav_shopall_sbc_mobcomp_powerbank?ie=UTF8&node=6612025031"


def amazon(url):
    sourcecode = requests.get(url)
    sourcecode_text = sourcecode.text
    soup = BeautifulSoup(sourcecode_text, "html.parser")
    # add "html.parser" as second arg , so you not get a warning .
    # use soup.find_all for new code , also soup.findAll should work 
    for link in soup.find_all('a', {'class': 'a-link-normal aok-block a-text-normal'}):
        href = link.get('href')
        print(href)

amazon(link)

相关问题 更多 >