如何使用BeautifulSoup刮取超链接标题?

2024-09-30 10:33:11 发布

您现在位置:Python中文网/ 问答频道 /正文

所以,我想从这个网站上搜刮:https//viewyourdale gabrielsimone.com'

产品名称和价格在每个div class=“info wrapper”下 我可以提取价格没有问题,但当我试图提取产品标题,它不能转换成文本作为其a href链接。每个产品名称都位于href下的div类下。 所以我的问题是,我如何刮产品名称

import json
from bs4 import BeautifulSoup
import requests 
import csv
from datetime import datetime

url = 'https://viewyourdeal-gabrielsimone.com'

gmaInfo=[]
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")
for info in content.findAll('div', attrs={"class" : "wrapper ease-animation"}):
    gridObject = {
            "title" : info.find('div', attrs={"class" : "title animation allgrey"}),
            "price" : info.find('span', attrs={"class":"red-price"}).text
            }
    print(gridObject)
    with open('index.csv', 'w') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow([gridObject])

Tags: csvhttpsimportdivinfocom价格content
2条回答

我对我的div类太具体了,我把类改成了简单的标题,效果很好

在下面的代码中,很少有项返回为None。只需提供If条件If元素exists获取文本

from bs4 import BeautifulSoup
import requests
import csv
from datetime import datetime

url = 'https://viewyourdeal-gabrielsimone.com'

gmaInfo=[]
response = requests.get(url, timeout=5)
content = BeautifulSoup(response.content, "html.parser")

for info in content.findAll('div', attrs={"class" : "wrapper ease-animation"}):
   if info.find('div', attrs={"class": "title animation allgrey"}):
     gridObject = {
            "title" : info.find('div', attrs={"class" : "title animation allgrey"}).text.strip(),
            "price" : info.find('span', attrs={"class":"red-price"}).text
            }
     print(gridObject)
     with open('index.csv', 'w') as csv_file:
        writer = csv.writer(csv_file)
        writer.writerow([gridObject])

相关问题 更多 >

    热门问题