Dryscrape:使用xpath从父节点列表中获取子节点数据

2024-05-19 15:38:55 发布

您现在位置:Python中文网/ 问答频道 /正文

为了学习,我尝试使用dryscrape和python来获取http://quotes.toscrape.com/。我可以用^{cl1}得到所有div$

import dryscrape
from bs4 import BeautifulSoup
session = dryscrape.Session()
url = 'http://quotes.toscrape.com/'
print 'Visiting the URL...'
session.visit(url)
print 'Status: ', session.status_code()
for div in session.xpath("//div[@class='quote']"):
    # please help me to scrape author and quote for each div elements

Tags: fromimportdivcomhttpurlforsession
2条回答
import requests
from bs4 import BeautifulSoup
url = 'http://quotes.toscrape.com/'
r = requests.get(url)
soup = BeautifulSoup(r.text)
for div in soup.findAll("div", {"class": "quote"}):
  print('Quote : ' + div.find('span').get_text())
  print('Author : ' + div.find('small').get_text())

我们可以循环遍历每个xpath元素,这些元素将是具有单个元素内容的对象。每个对象都有获取数据的方法。在

import dryscrape
session = dryscrape.Session()
url = 'http://quotes.toscrape.com/'
print 'Visiting the URL...'
session.visit(url)
print 'Status: ', session.status_code()
for div in session.xpath("//div[@class='quote']"):
    print "Quote: ", div.at_xpath(".//span").text()
    print "Author: ", div.at_xpath(".//small").text()

相关问题 更多 >