用python抓取jsp网页内容

2024-09-30 01:32:01 发布

您现在位置:Python中文网/ 问答频道 /正文

使用Python和requests库,我有一个邮政编码的列表,我想从这些代码中为每一个编译一个附近的CVS存储地址列表。我可以毫无问题地提取address字段,但无法动态生成下一页,因为URL中没有“&zip=77098”(或等效值)。每次我访问页面时,我都会得到一个看似随机的“requestid”值。在

http://www.cvs.com/store-locator/store-locator-landing.jsp?_requestid=1003175

如果我复制此链接并粘贴到另一个浏览器中,它会将我路由回默认的CVS位置。有没有一种方法可以在URL中发送邮政编码或动态设置要搜索的位置?在

这是我的一个邮政编码(不工作)。它返回“默认”位置,而不是标题中特定于zip-in的位置:

data = {"search":"77098"}
urlx = 'http://www.cvs.com/store-locator/store-locator-landing.jsp'
cookies = requests.get(urlx).cookies

rx = requests.post(urlx, cookies=cookies,data=data, headers={'user-agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'})

soupx = BeautifulSoup(rx.content, "lxml-xml")
addressList = soupx.findAll("div", { "class" : "address-wrap" })
distanceList = soupx.findAll("span", { "class" : "store-miles" })

Tags: storeurl列表dataaddress动态ziprequests
1条回答
网友
1楼 · 发布于 2024-09-30 01:32:01

有一个相当完整的事情要做,首先你需要获得你输入的邮政编码的坐标,这样你就可以在以后使用它们来发布给你搜索结果的url:

urlx = 'http://www.cvs.com/store-locator/store-locator-landing.jsp'
headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'}

# params for get request to get coordinates for later post. 
coord_params = { "output": "json",
                "key": "AkezKYdo-i6Crmr6nW9y0Ce_72T-osA8SwDdbgvfMSrKL47FVwQOpjBRGW_ON5Aq",
                "$filter": "Cvs_Store_Flag Eq 'Y'"}

# This provides the coordinates.
coords_url = "https://dev.virtualearth.net/REST/v1/Locations"

# The post to get the actual results is to this url.    
post = "https://www.cvs.com/rest/bean/cvs/store/CvsStoreLocatorServices/getSearchStore"

zipcode = "77098"
# Template to pass each zip to in your actual loop.
template = "{zipcode},US"

with requests.Session() as s:
    s.get(urlx)
    # Add the query param passing in each zipcode
    coord_params["query"] = template.format(zipcode=zipcode)
    js = s.get(coords_url, params=coord_params).json()
    # Parse latitude and longitude from the returned json.
    latitude, longitude =(js[u'resourceSets'][0][u'resources'][0]["point"][u'coordinates'])
     # finally get the  search results.
    results = s.post(post, data={"latitude": latitude, "longitude":longitude}).json()

 from pprint import pprint as pp
 pp(results)

输出:

^{pr2}$

有一个对https://dev.virtualearth.net/REST/v1/Locations的调用,这是一个Bing地图Microsoft api,我建议你建立一个帐户,并创建你自己的应用程序,这将允许你一个密钥,我花了两分钟的时间。据我所知,免费限制是每天3万个请求,所以这应该是足够的。在

相关问题 更多 >

    热门问题