从网站中提取数据的代码存在问题

2024-10-01 00:32:37 发布

您现在位置:Python中文网/ 问答频道 /正文

我有this website,我想通过Python提取所有公司名称,如West Wood EventsMitchell Event Planning

但是我被困在soup.find上,因为它导致了我[]。 当我查看页面时,让我们这样说:

< div class="LinesEllipsis  vendor-name--55315 primaryBold--a3d1e body1--24afd">Mitchell Event Planning<wbr></div >

在这方面,我将写:

week = soup.find(class_='LinesEllipsis  vendor-name--55315 primaryBold--a3d1e body1--24afd')

print(week)

我得到了0

我错过什么了吗?我对这个很陌生


Tags: namediveventfindthisclassvendorplanning
1条回答
网友
1楼 · 发布于 2024-10-01 00:32:37

此字符串不是单个类,而是由空格分隔的多个类

在某些模块中,您必须将原始字符串与所有空格一起使用,但在BS中,您似乎必须使用由单个空格分隔的类


如果在LinesEllipsisvendor-name 55315之间使用单个空格,则代码对我有效

week = soup.find_all(class_='LinesEllipsis vendor-name 55315 primaryBold a3d1e body1 24afd')

或者如果我对字符串中的每个类使用带点的CSS选择器

week = soup.select('.LinesEllipsis.vendor-name 55315.primaryBold a3d1e.body1 24afd')

最小工作代码

import requests
from bs4 import BeautifulSoup as BS

url = 'https://www.theknot.com/marketplace/wedding-planners-acworth-ga?page=2'

r = requests.get(url)
soup = BS(r.text, 'html.parser')

#week = soup.select('.LinesEllipsis.vendor-name 55315.primaryBold a3d1e.body1 24afd')
week = soup.find_all(class_='LinesEllipsis vendor-name 55315 primaryBold a3d1e body1 24afd')

for item in week:
    print(item.text)

结果:

The Charming Details
Enraptured Events
pearl and sky events - planning, design and florals
Unique Occasions ByTNicole, Inc
Platinum Eventions
RED COMPANY ATLANTA
Pop + Fizz: Event Planning and Design
Patricia Elizabeth, certified wedding planners/producer
Rienza Events
Pollyanna Richter Weddings
Calabash Events, Inc.
Weddings by Carmona LLC
Emily Jordan Events
Perfectly Taylored Events
Lindsey Wise Designs
Elegant Weddings and Affairs
Party PLANit
Wedded Bliss
Above the Fray
Willow Jaymes Events
Coco Red Events
Liz D. Events, LLC
Leslie Cox Events
YSE LLC
Marmaros Productions
PerfectionsID, LLC
All Things Love
West Wood Events
Everlasting Elegance
Prestigious Occasions

相关问题 更多 >