擅长:python、mysql、java
<p>我使用了<a href="https://github.com/PiotrDabkowski/Js2Py" rel="nofollow noreferrer">^{<cd1>}</a>,因为<code>materials</code>对象包含多个键(<code>BVRRRatingSummarySourceID</code>、<code>BVRRSecondaryRatingSummarySourceID</code>和{<cd5>}),如果您需要的话,用regex从它的值中获取HTML要困难得多。在</p>
<pre><code>from bs4 import BeautifulSoup
import js2py
import requests
r = requests.get('https://bedbathandbeyond.ugc.bazaarvoice.com/2009-en_us/1061083288/reviews.djs?format=embeddedhtml')
pattern = (r'var'
r'\s+'
r'materials'
r'\s*=\s*'
r'{"BVRRRatingSummarySourceID".*}')
js_materials = re.search(pattern, r.text).group()
obj = js2py.eval_js(js_materials).to_dict()
html = obj['BVRRSourceID']
soup = BeautifulSoup(html, 'lxml')
spans = soup.select('span.BVRRReviewAbbreviatedText')
</code></pre>
^{pr2}$
<p>在下面的示例中,我只使用了<code>BVRRSourceID</code>键下的HTML,但是您可以通过将值连接在一起来使用整个HTML:</p>
<pre><code>html = ''.join(obj.values())
</code></pre>
<p>如果您想使用<code>lxml</code>解析器,不要忘记安装<code>js2py</code>:<code>pip install js2py</code>和{<cd9>}。在</p>