BeautifulSoup不显示内容

2024-09-30 20:18:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从MCX印度网站上搜集现货价格数据。 检查元素时可见的HTML脚本如下所示:

<div class="contents spotmarketprice"> <div id="cont-1" style="display: block;"> <table class="mcx-table mrB20" width="100%" cellspacing="8" id="tblSMP"> <thead> <tr> <th class="symbol-head"> Commodity </th> <th> Unit </th> <th class="left1"> Location </th> <th class="right1"> Spot Price (Rs.) </th> <th> Up/Down </th> </tr> </thead> <tbody> <tr> <td class="symbol" style="width:30%;">ALMOND</td> <td style="width:17%;">1 KGS</td> <td align="left" style="width:17%;">DELHI</td> <td align="right" style="width:17%;">558.00</td> <td align="right" class="padR20" style="width:19%;">=</td> </tr>

我编写的代码是:

#import the required libraries    
from bs4 import BeautifulSoup
import requests

#Getting data from website
source= requests.get('http://www.mcxindia.com/market-data/spot-market-price').text

#Getting the html code of the website
soup = BeautifulSoup(source, 'lxml')

#Navigating to the blocks where required content is present
division_1= soup.find('div', class_="contents spotmarketprice").div.table

#Displaying the results
print(division_1.tbody)

输出:

<tbody>
   </tbody>

在网站上,我想得到的内容可以在。。。但是,这里没有显示任何内容。请提出一个解决方案


Tags: theimportdiv网站stylecontentstablewidth
2条回答

表中的数据似乎是通过JavaScript上传的

这就是为什么,如果您试图使用requests库获取此信息,返回时不会收到表的数据requests根本不支持JS。因此,这里的问题不在BeautifulSoup

< >擦除JS驱动的数据,考虑使用^ {CD4>}和chromedriver。本例中的解决方案如下所示:

# import libraries
from bs4 import BeautifulSoup
from selenium import webdriver

# create a webdriver
chromedriver_path = 'C:\\path\\to\\chromedriver.exe'
driver = webdriver.Chrome(chromedriver_path)

# go to the page and get its source
driver.get('http://www.mcxindia.com/market-data/spot-market-price')
soup = BeautifulSoup(driver.page_source, 'html.parser')

# fetch mentioned data
table = soup.find('table', {'id': 'tblSMP'})
for tr in table.tbody.find_all('tr'):
    row = [td.text for td in tr.find_all('td')]
    print(row)

# close the webdriver
driver.quit()

上述脚本的输出为:

['ALMOND', '1 KGS', 'DELHI', '558.00', '=']
['ALUMINIUM', '1 KGS', 'THANE', '137.60', '=']
['CARDAMOM', '1 KGS', 'VANDANMEDU', '2,525.00', '=']
['CASTORSEED', '100 KGS', 'DEESA', '3,626.00', '▼']
['CHANA', '100 KGS', 'DELHI', '4,163.00', '▲']
['COPPER', '1 KGS', 'THANE', '388.30', '=']
['COTTON', '1 BALES', 'RAJKOT', '15,790.00', '▲']
['CPO', '10 KGS', 'KANDLA', '630.10', '▼']
['CRUDEOIL', '1 BBL', 'MUMBAI', '2,418.00', '▲']
['GOLD', '10 GRMS', 'AHMEDABAD', '40,989.00', '=']
['GOLDGUINEA', '8 GRMS', 'AHMEDABAD', '32,923.00', '=']
['GOLDM', '10 GRMS', 'AHMEDABAD', '40,989.00', '=']
['GOLDPETAL', '1 GRMS', 'MUMBAI', '4,129.00', '=']
['GUARGUM', '100 KGS', 'JODHPUR', '5,880.00', '=']
['GUARSEED', '100 KGS', 'JODHPUR', '3,660.00', '=']

UPD:我必须说明上面的代码回答了查看此特定表的问题。然而,有时网站将数据存储在“application/json”或类似的标记中,这些标记可以通过“requests”库访问(因为它们不需要JS)

正如αԋɱҽԃ αмєяιcαη发现的,当前网站包含这样的标签。请核对他的答案。在这种情况下,使用requests确实比使用selenium要好。

import requests
import re
import json
import pandas as pd


goal = ['EnSymbol', 'Unit', 'Location', 'TodaysSpotPrice']

def main(url):
    r = requests.get(url)
    match = json.loads(re.search(r'"Data":(\[.*?\])', r.text).group(1))
    allin = []
    for item in match:
        allin.append([item[x] for x in goal])
    df = pd.DataFrame(allin, columns=goal)
    print(df)


main("https://www.mcxindia.com/market-data/spot-market-price")

输出:

         EnSymbol     Unit    Location  TodaysSpotPrice
0          ALMOND    1 KGS       DELHI           558.00
1       ALUMINIUM    1 KGS       THANE           137.60
2        CARDAMOM    1 KGS  VANDANMEDU          2525.00
3      CASTORSEED  100 KGS       DEESA          3626.00
4           CHANA  100 KGS       DELHI          4163.00
5          COPPER    1 KGS       THANE           388.30
6          COTTON  1 BALES      RAJKOT         15880.00
7             CPO   10 KGS      KANDLA           635.90
8        CRUDEOIL    1 BBL      MUMBAI          2418.00
9            GOLD  10 GRMS   AHMEDABAD         40989.00
10     GOLDGUINEA   8 GRMS   AHMEDABAD         32923.00
11          GOLDM  10 GRMS   AHMEDABAD         40989.00
12      GOLDPETAL   1 GRMS      MUMBAI          4129.00
13        GUARGUM  100 KGS     JODHPUR          5880.00
14       GUARSEED  100 KGS     JODHPUR          3660.00
15          KAPAS   20 KGS      RAJKOT           927.50
16           LEAD    1 KGS     CHENNAI           141.60
17      MENTHAOIL    1 KGS   CHANDAUSI          1295.10
18     NATURALGAS  1 mmBtu      HAZIRA           138.50
19         NICKEL    1 KGS       THANE           892.00
20         PEPPER  100 KGS       KOCHI         32700.00
21       RAW JUTE  100 KGS     KOLKATA          4999.00
22  RBD PALMOLEIN   10 KGS      KANDLA           700.40
23      REFSOYOIL   10 KGS      INDORE           845.25
24         SILVER    1 KGS   AHMEDABAD         36871.00
25        SILVERM    1 KGS   AHMEDABAD         36871.00
26      SILVERMIC    1 KGS   AHMEDABAD         36871.00
27      SUGARMDEL  100 KGS       DELHI          3380.00
28      SUGARMKOL  100 KGS    KOLHAPUR          3334.00
29      SUGARSKLP  100 KGS    KOLHAPUR          3275.00
30            TIN    1 KGS      MUMBAI          1160.50
31          WHEAT  100 KGS       DELHI          1977.50
32           ZINC    1 KGS       THANE           155.15

如果您想要更改符号:

以下是它的版本:

import requests
import re
import json
import pandas as pd


goal = ['EnSymbol', 'Unit', 'Location', 'TodaysSpotPrice', 'Change']


def main(url):
    r = requests.get(url)
    match = json.loads(re.search(r'"Data":(\[.*?\])', r.text).group(1))
    allin = []
    for item in match:
        item = [item[x] for x in goal]
        item[-1] = '▲' if item[-1] > 0 else '▼' if item[-1] < 0 else "="
        allin.append(item)
    df = pd.DataFrame(allin, columns=goal)
    print(df)


main("https://www.mcxindia.com/market-data/spot-market-price")

输出:

         EnSymbol     Unit    Location  TodaysSpotPrice Change
0          ALMOND    1 KGS       DELHI           558.00      =
1       ALUMINIUM    1 KGS       THANE           137.60      =
2        CARDAMOM    1 KGS  VANDANMEDU          2525.00      =
3      CASTORSEED  100 KGS       DEESA          3626.00      =
4           CHANA  100 KGS       DELHI          4163.00      =
5          COPPER    1 KGS       THANE           388.30      =
6          COTTON  1 BALES      RAJKOT         15880.00      ▲
7             CPO   10 KGS      KANDLA           635.90      ▲
8        CRUDEOIL    1 BBL      MUMBAI          2418.00      ▲
9            GOLD  10 GRMS   AHMEDABAD         40989.00      =
10     GOLDGUINEA   8 GRMS   AHMEDABAD         32923.00      =
11          GOLDM  10 GRMS   AHMEDABAD         40989.00      =
12      GOLDPETAL   1 GRMS      MUMBAI          4129.00      =
13        GUARGUM  100 KGS     JODHPUR          5880.00      =
14       GUARSEED  100 KGS     JODHPUR          3660.00      =
15          KAPAS   20 KGS      RAJKOT           927.50      ▲
16           LEAD    1 KGS     CHENNAI           141.60      =
17      MENTHAOIL    1 KGS   CHANDAUSI          1295.10      =
18     NATURALGAS  1 mmBtu      HAZIRA           138.50      ▲
19         NICKEL    1 KGS       THANE           892.00      =
20         PEPPER  100 KGS       KOCHI         32600.00      ▼
21       RAW JUTE  100 KGS     KOLKATA          4999.00      =
22  RBD PALMOLEIN   10 KGS      KANDLA           700.40      ▼
23      REFSOYOIL   10 KGS      INDORE           845.25      =
24         SILVER    1 KGS   AHMEDABAD         36871.00      =
25        SILVERM    1 KGS   AHMEDABAD         36871.00      =
26      SILVERMIC    1 KGS   AHMEDABAD         36871.00      =
27      SUGARMDEL  100 KGS       DELHI          3380.00      ▼
28      SUGARMKOL  100 KGS    KOLHAPUR          3334.00      ▲
29      SUGARSKLP  100 KGS    KOLHAPUR          3275.00      ▼
30            TIN    1 KGS      MUMBAI          1160.50      ▼
31          WHEAT  100 KGS       DELHI          1977.50      ▲
32           ZINC    1 KGS       THANE           155.15      =

相关问题 更多 >