如何刮取没有唯一ID的字符串进行数据提取?

2024-09-28 01:33:47 发布

您现在位置:Python中文网/ 问答频道 /正文

图中有一个名为for sale in 63702 Kolaram的文本

请说明如何使用BeautifulSoupPython提取该字符串

Webscraping image

https://www.magicbricks.com/property-for-sale-in-namakkal-pppfs


Tags: 字符串inhttps文本comforwwwproperty
2条回答

如果只需要文本,可以使用指向跨度元素的选择器:

from bs4 import BeautifulSoup
import requests
import re

url = 'https://www.magicbricks.com/property-for-sale-in-namakkal-pppfs'
request = requests.get(url)
soup = BeautifulSoup(request.content, 'html5lib')
spans = [re.sub('[\s]+', ' ', re.sub('[\n\r\t]*', '', span.text)).strip() for span in soup.select('div.m-srp-card__container > div.m-srp-card__desc > div.m-srp-card__heading > h2 > span.m-srp-card__title')]
print(spans)

将返回文本列表:

['3 BHK Villa for Sale in 637201 Kolaram', '1 BHK House for Sale in Rasipuram', '2 BHK House for Sale in Rasipuram', 'Plot/Land for Sale in Thiruchengode Rd', 'Plot/Land for Sale in Mathiyampatty', 'Plot/Land for Sale in Namagiripetai', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Ladhuvaadi', 'Plot/Land for Sale in TGP Star City', 'Plot/Land for Sale in Palpakki', 'Plot/Land for Sale in Pavai International school back side', 'Plot/Land for Sale in Rasipuram', '3 BHK Villa for Sale in Kondichettipatti', '2 BHK House for Sale in Thiruchengode Rd', '1 BHK House for Sale in Ganesapuram namakkal', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Pon nagar bus stop', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in Rasipuram', '2 BHK House for Sale in Periyapatti', '2 BHK House for Sale in Tiruchengode', 'Plot/Land for Sale in Rasipuram', 'Plot/Land for Sale in L.Kanavaipatti Road', 'Plot/Land for Sale in Muthugapatti', 'Plot/Land for Sale in Kondichettipatti', 'Plot/Land for Sale in Rasipuram', '2 BHK House for Sale in Sai Brindhavan nagar Neare', 'Plot/Land for Sale in sri sowrnabairavar nagar', '2 BHK House for Sale in Namagiripettai', '3 BHK House for Sale in Pachal']

如果需要从每张卡中获取其他信息,则减少选择器并迭代div元素,从其子元素中提取所需信息

只需使用beautifulsoap的find_all()函数即可完成此操作

import requests

page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
result_text = soup.find_all(text=your_text_which_you_want_to_find)

这将返回一个包含要使用bs4查找的文本的列表

相关问题 更多 >

    热门问题