用Python选择div、h2和h3类进行Web抓取

2024-10-04 11:28:29 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我第一次使用Python和web抓取。一直在四处寻找,仍然无法得到我需要做的。在

下面是我通过Chrome使用过的元素的打印屏幕。在

我要做的是,我试着从选定的城市名称中获取公寓名称和地址。在

List of Apartments in a selected city

import requests
from bs4 import BeautifulSoup

#url = 'http://www.homestead.ca/apartments-for-rent/'                           
rootURL = 'http://www.homestead.ca'
response = requests.get(rootURL)                                                   
html = response.content
soup = BeautifulSoup(html,'lxml')

dropdown_list = soup.select(".primary .child-pages a")

#city_names=[dropdown_list_value.text for dropdown_list_value in dropdown_list]
#print (city_names)

cityLinks=[rootURL + dropdown_list_value['href'] for dropdown_list_value in dropdown_list]

for cityLinks_select in dropdown_list:                                       #Looping each city from the Apartment drop down list
    print ('Selecting city:',cityLinks_select.text)
    cityResponse = requests.get(cityLinks)
    cityHtml = cityResponse.content
    citySoup = BeautifulSoup(cityHtml,'lxml')

    community_list = soup.select(".extended-search .property-container a[h2 h3]")
    get and print the apartment link
    get and print the apartment name
    get and print the address of the apartment

Tags: theincityforgetvaluerequestsselect
1条回答
网友
1楼 · 发布于 2024-10-04 11:28:29

正如我所说,有些数据是动态创建的,如果我们看一下源代码本身,我们会看到:

                        <div class="content">
                                    <div class="title-container">
                                        <h2 class="building-name"><%= building.get('name') %></h2>
                                        <h3 class="address"><%= building.get('address').address %></h3>
                                    </div>

                                    <div class="rent">
                                        <h4 class="sub-title">Rent from</h4>
                                        <% if (building.get('statistics').suites.rates.min !== 'undefined') { %>
                                            <% $min_rate = commaSeparateNumber(parseInt(building.get('statistics').suites.rates.min)); %>
                                            <span class="rent-value">$<%= $min_rate %></span>
                                        <% } %>
                                    </div>

我们能从源头得到的只有建筑名称、地址和电话号码:

^{pr2}$

如果我们模仿ajax请求,我们可以获得json格式的所有数据:

import requests
from bs4 import BeautifulSoup
from pprint import pprint as pp

rootURL = 'http://www.homestead.ca'
response = requests.get(rootURL)
html = response.content
soup = BeautifulSoup(html, 'lxml')

dropdown_list = soup.select(".primary .child-pages a")


cityLinks = (rootURL + dropdown_list_value['href'] for dropdown_list_value in dropdown_list)

# params for our request
params = {"show_promotions": "true",
        "show_custom_fields": "true",
        "client_id": "6",
        "auth_token": "sswpREkUtyeYjeoahA2i",
        "min_bed": "-1",
        "max_bed": "100",
        "min_bath": "0",
        "max_bath": "10",
        "min_rate": "0",
        "max_rate": "4000",
        "keyword": "false",
        "property_types": "low-rise-apartment,mid-rise-apartment,high-rise-apartment,luxury-apartment,townhouse,house,multi-unit-house,single-family-home,duplex,tripex,semi",
        "order": "max_rate ASC, min_rate ASC, min_bed ASC, max_bath ASC",
        "limit": "50",
        "offset": "0",
        "count": "false"}

for city in cityLinks:  # Looping each city from the Apartment drop down list
    with requests.Session() as s:
        r= s.get(city)
        # we need to parse the city_id for out next request to work
        soup = BeautifulSoup(r.content)
        city_id = soup.select_one("div.hidden.search-data")["data-city-id"]
        # update params with the city id
        params["city_id"] = city_id
        js = s.get("http://api.theliftsystem.com/v2/search", params=params).json()
        pp(js)

现在我们得到的数据如下:

[{u'address': {u'address': u'325 North Park Street',
               u'city': u'Brantford',
               u'city_id': 332,
               u'country': u'Canada',
               u'country_code': u'CAN',
               u'intersection': u'',
               u'neighbourhood': u'',
               u'postal_code': u'N3R 2X4',
               u'province': u'Ontario',
               u'province_code': u'ON'},
  u'availability_count': 6,
  u'availability_status': 1,
  u'availability_status_label': u'Available Now',
  u'building_header': u'',
  u'client': {u'email': u'bcadieux@homestead.ca',
              u'id': 6,
              u'name': u'Homestead Land Holdings',
              u'phone': u'613-546-3146',
              u'website': u'www.homestead.ca'},
  u'contact': {u'alt_extension': u'',
               u'alt_phone': u'',
               u'email': u'rentals@homestead.ca',
               u'extension': u'',
               u'fax': u'(519) 752-6855',
               u'name': u'',
               u'phone': u'519-752-3596'},
  u'details': {u'features': u'',
               u'location': u'',
               u'overview': u"Located on North Park Street and Memorial Avenue,this quiet building is within walking distance of the following: - Zehrs Plaza, North Park Plaza, Shoppers Drug Mart, Zehrs Grocery Store, Zellers, Pet Store, Party Supply Store, furniture store, variety store, Black's Photography, paint shop and veterinary clinic\xa0  - Restaurants and coffee shops\xa0  - Wayne Gretzky Recreational Arena\xa0  - Medical Clinic,Shoppers Home Health Care Clinic and Pharmacy\xa0  - Catholic Elementary School\xa0  - On bus route ",
               u'suite': u''},
  u'geocode': {u'distance': None,
               u'latitude': u'43.1703624',
               u'longitude': u'-80.2605725'},
  u'id': 309,
  u'matched_beds': [u'0', u'1', u'2'],
  u'matched_suite_names': [u'Bachelor', u'One Bedroom', u'Two Bedroom'],
  u'min_availability_date': u'',
  u'name': u'North Park Tower',
  u'office_hours': u'',
  u'parking': {u'additional': u'', u'indoor': u'', u'outdoor': u''},
  u'permalink': u'http://www.homestead.ca/apartments/325-north-park-street-brantford',
  u'pet_friendly': True,
  u'photo': u'1443018148_2.jpg',
  u'photo_path': u'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443018148_2.jpg',
  u'promotion': {u'featured': 0},
  u'property_type': u'High-rise-apartment',
  u'statistics': {u'suites': {u'bathrooms': {u'average': 1.0,
                                             u'max': 1.0,
                                             u'min': 1.0},
                              u'bedrooms': {u'average': u'1.0',
                                            u'max': 2,
                                            u'min': 0},
                              u'rates': {u'average': 950.0,
                                         u'max': 1275.0,
                                         u'min': 625.0},
                              u'square_feet': {u'average': 0.0,
                                               u'max': u'0.0',
                                               u'min': u'0.0'}}},
  u'thumbnail_path': u'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443018148_2.jpg',
  u'website': {u'description': u'', u'title': u'', u'url': u''}},
 {u'address': {u'address': u'661 West Street',
               u'city': u'Brantford',
               u'city_id': 332,
               u'country': u'Canada',
               u'country_code': u'CAN',
               u'intersection': u'',
               u'neighbourhood': u'',
               u'postal_code': u'N3R 6W9',
               u'province': u'Ontario',
               u'province_code': u'ON'},
  u'availability_count': 6,
  u'availability_status': 1,
  u'availability_status_label': u'Available Now',
  u'building_header': u'',
  u'client': {u'email': u'bcadieux@homestead.ca',
              u'id': 6,
              u'name': u'Homestead Land Holdings',
              u'phone': u'613-546-3146',
              u'website': u'www.homestead.ca'},
  u'contact': {u'alt_extension': u'',
               u'alt_phone': u'',
               u'email': u'rentals@homestead.ca',
               u'extension': u'',
               u'fax': u'(519) 751-0379',
               u'name': u'',
               u'phone': u'519-751-3867'},
  u'details': {u'features': u'',
               u'location': u'',
               u'overview': u'Located in the North end of Brantford, Westgate Tower is in an area that resembles a city within a city. There are a variety of banks, grocery stores, drug stores, malls, a wide selection of fast food, fine dining restaurants and an after hours medical centre, within waking distance.',
               u'suite': u''},
  u'geocode': {u'distance': None,
               u'latitude': u'43.1733242',
               u'longitude': u'-80.2482991'},
  u'id': 310,
  u'matched_beds': [u'0', u'1', u'2'],
  u'matched_suite_names': [u'Bachelor', u'One Bedroom', u'Two Bedroom'],
  u'min_availability_date': u'',
  u'name': u'Westgate Apartments',
  u'office_hours': u'',
  u'parking': {u'additional': u'', u'indoor': u'', u'outdoor': u''},
  u'permalink': u'http://www.homestead.ca/apartments/661-west-street-brantford',
  u'pet_friendly': True,
  u'photo': u'1443017488_1.jpg',
  u'photo_path': u'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017488_1.jpg',
  u'promotion': {u'featured': 0},
  u'property_type': u'High-rise-apartment',
  u'statistics': {u'suites': {u'bathrooms': {u'average': 1.0,
                                             u'max': 1.0,
                                             u'min': 1.0},
                              u'bedrooms': {u'average': u'1.0',
                                            u'max': 2,
                                            u'min': 0},
                              u'rates': {u'average': 975.0,
                                         u'max': 1300.0,
                                         u'min': 650.0},
                              u'square_feet': {u'average': 0.0,
                                               u'max': u'0.0',
                                               u'min': u'0.0'}}},
  u'thumbnail_path': u'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017488_1.jpg',
  u'website': {u'description': u'', u'title': u'', u'url': u''}},
 {u'address': {u'address': u'321 Fairview Drive',
               u'city': u'Brantford',
               u'city_id': 332,
               u'country': u'Canada',
               u'country_code': u'CAN',
               u'intersection': u'',
               u'neighbourhood': u'',
               u'postal_code': u'N3R 2X6',
               u'province': u'Ontario',
               u'province_code': u'ON'},
  u'availability_count': 8,
  u'availability_status': 1,
  u'availability_status_label': u'Available Now',
  u'building_header': u'',
  u'client': {u'email': u'bcadieux@homestead.ca',
              u'id': 6,
              u'name': u'Homestead Land Holdings',
              u'phone': u'613-546-3146',
              u'website': u'www.homestead.ca'},
  u'contact': {u'alt_extension': u'',
               u'alt_phone': u'',
               u'email': u'rentals@homestead.ca',
               u'extension': u'',
               u'fax': u'(519) 752-6855',
               u'name': u'',
               u'phone': u'519-752-3596'},
  u'details': {u'features': u'',
               u'location': u'',
               u'overview': u'Dornia Manor is a quiet, ninety-two unit apartment building located in the North end of Brantford. We offer one, two and three bedroom units and one penthouse suite. The building is located in close proximity to many major services such as banking, shopping, health services, recreational facilities, beauty shops, dry cleaners, schools and churches. There is a bus stop at the front door and highway 403 is within minutes.',
               u'suite': u''},
  u'geocode': {u'distance': None,
               u'latitude': u'43.1706331',
               u'longitude': u'-80.2584034'},
  u'id': 308,
  u'matched_beds': [u'1', u'2', u'3'],
  u'matched_suite_names': [u'One Bedroom', u'Two Bedroom', u'Three Bedroom'],
  u'min_availability_date': u'',
  u'name': u'Dornia Manor',
  u'office_hours': u'',
  u'parking': {u'additional': u'', u'indoor': u'', u'outdoor': u''},
  u'permalink': u'http://www.homestead.ca/apartments/321-fairview-drive-brantford',
  u'pet_friendly': True,
  u'photo': u'1443017947_1.jpg',
  u'photo_path': u'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/full/1443017947_1.jpg',
  u'promotion': {u'featured': 0},
  u'property_type': u'High-rise-apartment',
  u'statistics': {u'suites': {u'bathrooms': {u'average': 1.375,
                                             u'max': 2.0,
                                             u'min': 1.0},
                              u'bedrooms': {u'average': u'2.25',
                                            u'max': 3,
                                            u'min': 1},
                              u'rates': {u'average': 1124.5,
                                         u'max': 1350.0,
                                         u'min': 899.0},
                              u'square_feet': {u'average': 0.0,
                                               u'max': u'0.0',
                                               u'min': u'0.0'}}},
  u'thumbnail_path': u'http://s3.amazonaws.com/lws_lift/homestead/images/gallery/256/1443017947_1.jpg',
  u'website': {u'description': u'', u'title': u'', u'url': u''}}]

这会给你网址,卧室和几乎所有你想要的东西。列表中的每一个dict都是一个列表,您只需使用键来访问所需的数据,例如:

 for dct in js:
        add = dct["address"]
        print(add["city"])
        print(add["postal_code"])
        print(add["province"])
        print(dct["permalink"])

会给你:

Brantford
N3R 2X4
Ontario
http://www.homestead.ca/apartments/325-north-park-street-brantford
Brantford
N3R 6W9
Ontario
http://www.homestead.ca/apartments/661-west-street-brantford
Brantford
N3R 2X6
Ontario
http://www.homestead.ca/apartments/321-fairview-drive-brantford

联系人信息在dct["contact"]下,统计信息在=dct["statistics"]下:

for dct in js:
        contact = dct["contact"]
        print(contact)
        stats = dct["statistics"]
        print(stats["suites"])

这会给你:

{u'alt_phone': u'', u'fax': u'(519) 752-6855', u'name': u'', u'alt_extension': u'', u'phone': u'519-752-3596', u'extension': u'', u'email': u'rentals@homestead.ca'}
{u'rates': {u'max': 1275.0, u'average': 950.0, u'min': 625.0}, u'bedrooms': {u'max': 2, u'average': u'1.0', u'min': 0}, u'bathrooms': {u'max': 1.0, u'average': 1.0, u'min': 1.0}, u'square_feet': {u'max': u'0.0', u'average': 0.0, u'min': u'0.0'}}
{u'alt_phone': u'', u'fax': u'(519) 751-0379', u'name': u'', u'alt_extension': u'', u'phone': u'519-751-3867', u'extension': u'', u'email': u'rentals@homestead.ca'}
{u'rates': {u'max': 1300.0, u'average': 975.0, u'min': 650.0}, u'bedrooms': {u'max': 2, u'average': u'1.0', u'min': 0}, u'bathrooms': {u'max': 1.0, u'average': 1.0, u'min': 1.0}, u'square_feet': {u'max': u'0.0', u'average': 0.0, u'min': u'0.0'}}
{u'alt_phone': u'', u'fax': u'(519) 752-6855', u'name': u'', u'alt_extension': u'', u'phone': u'519-752-3596', u'extension': u'', u'email': u'rentals@homestead.ca'}
{u'rates': {u'max': 1350.0, u'average': 1124.5, u'min': 899.0}, u'bedrooms': {u'max': 3, u'average': u'2.25', u'min': 1}, u'bathrooms': {u'max': 2.0, u'average': 1.375, u'min': 1.0}, u'square_feet': {u'max': u'0.0', u'average': 0.0, u'min': u'0.0'}}

你可以把所有这些放在一起得到你需要的东西。你可以调整参数,如果你在chrome工具或firebug中查看请求,实际上还有更多。在

相关问题 更多 >