<p>要获得正确的页面,请在请求中设置<code>User-Agent</code>Http头</p>
<p>例如:</p>
<pre><code>sptct_links =['https://www.macys.com/shop/mens-clothing/mens-blazers-sports-coats/Productsperpage/120?id=16499',
'https://www.macys.com/shop/mens-clothing/mens-blazers-sports-coats/Pageindex,Productsperpage/2,120?id=16499',
'https://www.macys.com/shop/mens-clothing/mens-blazers-sports-coats/Pageindex,Productsperpage/3,120?id=16499',
'https://www.macys.com/shop/mens-clothing/mens-blazers-sports-coats/Pageindex,Productsperpage/4,120?id=16499',
'https://www.macys.com/shop/mens-clothing/mens-blazers-sports-coats/Pageindex,Productsperpage/5,120?id=16499']
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
for link in sptct_links:
soup = BeautifulSoup(requests.get(link, headers=headers).content, 'html.parser') # < set headers=
for brand in soup.select('.productBrand'):
print(brand.get_text(strip=True))
</code></pre>
<p>印刷品:</p>
<pre><code>Michael Kors
MICHAEL Michael Kors
Michael Kors
Bar III
Bar III
Bar III
Unlisted by Kenneth Cole
Tallia
Michael Kors
Bar III
...and so on.
</code></pre>