<p>在我看来,您应该使用底层javascript字典,它已经以结构化格式保存了数据(以及更多)。在</p>
<p>可以使用<code>yaml</code>将javascript字典转换为Python<code>dict</code>对象。您可以轻松地从字典中访问字段,如<code>id</code>、<code>name</code>、<code>city</code>、<code>address</code>、<code>city</code>、<code>state</code>等</p>
<p>下面是一个有效的例子:</p>
<pre><code>import json, re, requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
import yaml
url = "https://potguide.com/alaska/marijuana-dispensaries/"
def get_links(link):
session = requests.Session()
session.headers['User-Agent'] = 'Mozilla/5.0'
r = session.get(link)
soup = BeautifulSoup(r.text,"lxml")
for items in soup.select("#StateStores .basic-listing"):
name = items.select_one("h4 a").text
namelink = urljoin(link,items.select_one("h4 a").get("href"))
get_info(session, name, namelink)
def get_info(session, title, url):
response = requests.get(url)
soup = BeautifulSoup(response.content, "lxml")
script = next((i for i in map(str, soup.find_all("script", type="text/javascript"))
if 'mapOptions' in i), None)
if script:
js_dict = script.split('__mapOptions = ')[1].split(';\n')[0]
d = yaml.load(js_dict)
print(title, d['mapStore']['phone'])
get_links(url)
</code></pre>
<p>结果:</p>
^{pr2}$