使用python从网站上获取每个产品的href

from bs4 import BeautifulSoup import requests url = "http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera" keyword = "keywords=digital+camera" r = requests.get(url) data = r.text soup = BeautifulSoup(data) for link in soup.find_all('a'): href = link.get('href') if href is None: continue elif keyword in href: print href

1条回答

网友

1楼 · 发布于 2024-10-02 12:24:35

Amazon正在检测用户代理（“浏览器的名称”）并根据该值更改内容。如果在请求中添加一个用户代理，您将得到添加了“keyword=digital+camera”的字符串。否则，你不会

url ="http://www.amazon.com/s/ref=nb_sb_noss_1?url=search-alias%3Daps&field-keywords=digital+camera"
import urllib2
from bs4 import BeautifulSoup
req = urllib2.Request(url, headers={ 'User-Agent': 'Mozilla/5.0' })
html = urllib2.urlopen(req).read()
soup = BeautifulSoup(html)
links = soup.find_all('a')
for l in links:
    href = l.get('href')
    title = l.get('title', '')
    if 'Sony W800/B 20.1 MP Digital' in title:
        print(href)  # prints: http://www.amazon.com/Sony-W800-Digital-Camera-Black/dp/B00I8BIBCW/ref=sr_1_2/183-0842534-8993425?s=photo&ie=UTF8&qid=1421400650&sr=1-2&keywords=digital+camera

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用python从网站上获取每个产品的href

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >