使用Python Beautiful Soup对Javascript表（带有网格和列表视图）进行Web抓取

from pandas.io.html import read_html from selenium import webdriver import json import requests import os import sys from bs4 import BeautifulSoup import requests driver = webdriver.Firefox(executable_path='C:\Drivers\geckodriver.exe') driver.get('https://boxes.mysubscriptionaddiction.com/subscription_boxes_for/food') table = driver.find_element_by_xpath('/html/body/div[3]/div/span/div[2]/div/div[1]/div[3]/div[3]/table') table_html = table.get_attribute('innerHTML') bs = BeautifulSoup(table_html, 'html.parser') rows = bs.select('tbody tr') print(bs)

1条回答

网友

1楼 · 发布于 2024-10-01 13:43:54

下面是如何获取您要查找的数据：（data是包含信息的dict）

import requests
from bs4 import BeautifulSoup
import json

scrape_url = 'https://boxes.mysubscriptionaddiction.com/subscription_boxes_for/food'

r1 = requests.get(scrape_url)
page = r1.content
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')

data_str = scripts[11].contents[0].strip()
data = json.loads(data_str,strict=False)
print(data['itemListElement'])

相关问题更多 >

编程相关推荐

热门问题

热门文章