findAll()的BeautifulSoup错误处理

2024-07-02 13:03:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在从这个link中提取数据。在下面的代码中,是否有任何方法可以对findAll()进行错误处理

现在,表中没有任何内容:

enter image description here

我的代码

import requests
from bs4 import BeautifulSoup

header = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/87.0.4280.88 Safari/537.36 '
}

def testingCode(stockIdNumber):
    MalaysiaStockBizURL = requests.get(str('https://www.malaysiastock.biz/Corporate-Infomation.aspx?securityCode='+ str(stockIdNumber)),headers=header)
    MalaysiaStockBizParser = BeautifulSoup(MalaysiaStockBizURL.text, 'html.parser')
    
    try:
        shareholdingChangesTable = MalaysiaStockBizParser.find('table', { 'id': 'ctl19_gvShareholdingChange'}).findAll('tr', limit=11)
        for testShare in shareholdingChangesTable:
            titleelem = testShare.find('td')
            if titleelem:
                print(titleelem)
                
            else:
                print("Error")
    
    except IndexError:
        print("Malaysia Stockbiz - Error on Shareholding Changes.")
    
testingCode('5090') # This one has error ''NoneType' object has no attribute 'findAll'', so need error handling
#testingCode('0105') # This one return some result

如果代码是MalaysiaStockBizParser.find('table', { 'id': 'ctl19_gvShareholdingChange'}),我可以进行错误处理,因为它将生成None。但是如果使用MalaysiaStockBizParser.find('table', { 'id': 'ctl19_gvShareholdingChange'}).findAll('tr', limit=11),我就不能。这将使我直接陷入错误“NoneType' object has no attribute 'findAll”。你知道我该怎么做吗

您可以将此link作为表格显示方式的示例: enter image description here


Tags: 代码importidtablelinkfindrequestshas
3条回答

您试图捕捉错误的错误。正如在错误消息中所看到的,您将获得AttributeError,因此您需要捕获该错误。将except语句更改为此将解决此问题:

try:
    # do something
except AttributeError:
    # do another thing
import httpx
import trio
import pandas as pd

codes = ["5090", "0105"]

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:90.0) Gecko/20100101 Firefox/90.0"
}


async def worker(channel):
    async with channel:
        async for client, code in channel:
            params = {
                "securityCode": code
            }
            r = await client.get('https://www.malaysiastock.biz/Corporate-Infomation.aspx', params=params)

            try:
                df = pd.read_html(
                    r.text, attrs={"id": "ctl19_gvShareholdingChange"})[0]

            except ValueError:
                df = "N/A"

            print("{}\nCode: {}\n{}".format("*" * 70, code, df))


async def main():
    async with httpx.AsyncClient(timeout=None) as client, trio.open_nursery() as nurse:
        client.headers.update(headers)
        sender, receiver = trio.open_memory_channel(0)

        async with receiver:
            for _ in range(2):
                nurse.start_soon(worker, receiver.clone())

            async with sender:
                for code in codes:
                    await sender.send([client, code])

if __name__ == "__main__":
    trio.run(main)

输出:

**********************************************************************
Code: 0105
  Date of change              Shares Director/Substantial Shareholder
0    23 Jun 2021    Acquired 200,000                 MR LIM TECK SENG
1    22 Jun 2021    Acquired 200,000                 MR LIM TECK SENG
2    23 Apr 2021    Disposed 750,000                 MR LIM TECK SENG
3    19 Apr 2021  Acquired 2,040,000             DATO' YEO BOON LEONG
4    19 Apr 2021  Acquired 2,040,000             DATO' YEO BOON LEONG
5    16 Apr 2021  Acquired 1,080,000             DATO' YEO BOON LEONG
6    16 Apr 2021    Acquired 685,000             DATO' YEO BOON LEONG
7    16 Apr 2021  Acquired 1,080,000             DATO' YEO BOON LEONG
8    16 Apr 2021    Acquired 685,000             DATO' YEO BOON LEONG
9    14 Apr 2021     Acquired 50,000                 MR LIM TECK SENG
**********************************************************************
Code: 5090
N/A

你可以试试那样的

import requests
from bs4 import BeautifulSoup

header = {
 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'
}

stock_id_number = '0105'

biz_url = requests.get(
    'https://www.malaysiastock.biz/Corporate-Infomation.aspx?securityCode={}'.format(
        stock_id_number
    ), headers=header
)
biz_soup = BeautifulSoup(biz_url.text, 'html.parser')

soup_table = biz_soup.find('table', {'id': 'ctl19_gvShareholdingChange'})
rows = soup_table.find_all('tr')

for row in rows:
    cols = row.find_all('td')
    for col in cols:
        print(col.text)
    print('====')
====
23 Jun 2021
Acquired 200,000
MR LIM TECK SENG
====
22 Jun 2021
Acquired 200,000
MR LIM TECK SENG
====
23 Apr 2021
Disposed 750,000
MR LIM TECK SENG
====
19 Apr 2021
Acquired 2,040,000
DATO' YEO BOON LEONG
====
19 Apr 2021
Acquired 2,040,000
DATO' YEO BOON LEONG
====
16 Apr 2021
Acquired 1,080,000
DATO' YEO BOON LEONG
====
16 Apr 2021
Acquired 685,000
DATO' YEO BOON LEONG
====
16 Apr 2021
Acquired 1,080,000
DATO' YEO BOON LEONG
====
16 Apr 2021
Acquired 685,000
DATO' YEO BOON LEONG
====
14 Apr 2021
Acquired 50,000
MR LIM TECK SENG
====

相关问题 更多 >