<p>Google Maps是一个Javascript驱动的网站,为了使它能够与<code>BS4</code>一起工作,您需要使用正则表达式解析<code>window.APP_INITIALIZATION_STATE</code>(查看页面的源代码)变量块,以找到您要查找的内容</p>
<p><code>BeautifulSoup</code>无法刮取动态网站。这就是为什么你会得到一个空的<code>list</code>的原因,因为作为回应,你没有寻找这样的类</p>
<p>要使其正常工作,可以使用<code>selenium</code>库,这是浏览器自动化:</p>
<pre class="lang-py prettyprint-override"><code>from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
options = Options()
options.page_load_strategy = 'normal'
driver = webdriver.Chrome(options = options)
# Opens URL
driver.get('https://www.google.com/search?q=central+hong+kong+bar')
# Clicks on the "Maps" view in Google Search, clicks on it and turns up on google maps
driver.find_element_by_xpath('//*[@id="hdtb-msb"]/div[1]/div/div[2]/a').click()
# Now, this part is akward but very simple. There's a better solution using a while loop.
# Locates first bar
element_container = driver.find_element_by_xpath('//*[@id="pane"]/div/div[1]/div/div/div[4]/div[1]/div[1]/div/a')
# Scrolls down to the "end" from the first bar
element_container.send_keys(Keys.END)
# Sleep for 3 sec until other bars are loaded
time.sleep(3)
# Scrolls down to the "end" again
element_container.send_keys(Keys.END)
time.sleep(3)
# Scrolls down to the "end" again
element_container.send_keys(Keys.END)
# Locates CSS selector for name and prints it
for names in driver.find_elements_by_css_selector('.qBF1Pd-haAclf'):
print(names.text)
driver.quit()
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>Quinary
The Old Man
The Envoy
COA
001
ROOM 309
HONI HONI Tiki Cocktail Lounge
ORIGIN gin bar
The Iron Fairies Hong Kong
Stockton
Tell Camellia Cocktail Bar
The Pontiac
Frank's Library
Dr. Fern's Gin Parlour
Wahtiki Island Lounge
Draft Land HK
The Wise King
The Diplomat Hong Kong
Karma Lounge
Geronimo Shot Bar HK
</code></pre>
<p>或者,您可以使用SerpApi中的<a href="https://serpapi.com/google-maps-api" rel="nofollow noreferrer">Google Maps API</a>。这是一个付费API,免费试用5000次搜索</p>
<p>主要区别在于,您不必弄清楚如何抓取复杂的Javascript驱动的网站,也不必考虑如何解决CAPTCHA(如果出现)或查找代理(如果需要)。查看<a href="https://serpapi.com/playground?engine=google_maps&q=central%20hong%20kong%20bar&ll=%4040.7455096%2C-74.0083012%2C14z&hl=en&type=search" rel="nofollow noreferrer">Playground</a></p>
<p>要集成的代码:</p>
<pre class="lang-py prettyprint-override"><code>from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY",
"engine": "google_maps",
"type": "search",
"google_domain": "google.com",
"q": "central hong kong bar",
"hl": "en",
"ll": "@22.2822068,114.1511132,16z"
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['local_results']:
bar_name = result['title']
print(bar_name)
</code></pre>
<p>输出:</p>
<pre class="lang-none prettyprint-override"><code>Quinary
COA
001
The Old Man
ROOM 309
ORIGIN gin bar
The Envoy
Wahtiki Island Lounge
The China Bar, Lan Kwai Fong
The Iron Fairies Hong Kong
Captain's Bar
Le Boudoir
Bar De Luxe
Please Don't Tell
Owl Lounge HK
Tell Camellia Cocktail Bar
HONI HONI Tiki Cocktail Lounge
J.Boroski
Frank's Library
Geronimo Shot Bar HK
</code></pre>
<blockquote>
<p>Disclaimer, I work for SerpApi.</p>
</blockquote>