用Mechaniz刮削

2024-06-28 20:33:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我遇到了一个问题,mechanize不能产生与浏览器相同的响应。我正试图从这个网页上获取价格,这个网页允许使用一个预先填充的网址将商品添加到购物篮中。在

http://store.nike.com/us/services/jcartService?callback=nike_Cart_hanleJCartResponse&action=addItem&lang_locale=en_US&country=US&catalogId=1&productId=463712&price=00.0&siteId=null&line1=Nike+Air+Max+1+Ultra+Moire&line2=Men%27s+Shoe&passcode=null&sizeType=null&skuAndSize=10661133%3A10&qty=1&rt=json&view=3&skuId=10661133&displaySize=14&_=142655682313

我所拥有的是:

import mechanize
import urllib
import cookielib
import BeautifulSoup
import html2text

url='http://store.nike.com/us/services/jcartService?callback=nike_Cart_hanleJCartResponse&action=addItem&lang_locale=en_US&country=US&catalogId=1&productId=463712&price=00.0&siteId=null&line1=Nike+Air+Max+1+Ultra+Moire&line2=Men%27s+Shoe&passcode=null&sizeType=null&skuAndSize=10661133%3A10&qty=1&rt=json&view=3&skuId=10661133&displaySize=14&_=142655682313'

br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.set_handle_equiv(True)
br.set_handle_gzip(False)
br.set_handle_redirect(True)
br.set_handle_referer(False)
br.set_handle_robots(True)
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-agent', 'Chrome')]

br.open(url)
pageText=br.open(url).read()
print pageText

然后,我计划做一些基本的字符串解析来获得价格。问题是,当我刮掉页面时,我得到的是:

print pageText

^{pr2}$

当它应该像在浏览器中那样返回如下内容时:

nike_Cart_hanleJCartResponse({
    "status" :"success","order" :{
        "id" :"O1014750586",
        "objType" :"order",
        "itemQuantity" :1,
        "priceInfo" :{
            "currencyFormat" :"$0.00",
            "currency" :"USD",
            "amount" :"75.0",
            ....
}]}]}});

我查看了lxml,但对于如何进行它感到相当困惑。是不是不可能正确地刮掉这一页?在

任何帮助都将不胜感激。提前谢谢!在


Tags: brimporttruehttpurl浏览器nullus
1条回答
网友
1楼 · 发布于 2024-06-28 20:33:29

首先导航到主存储页面,以便您可以获得正确的cookies。然后导航到所需的URL:

import mechanize

store_url = 'http://store.nike.com'
cart_url = 'http://store.nike.com/us/services/jcartService?callback=nike_Cart_hanleJCartResponse&action=addItem&lang_locale=en_US&country=US&catalogId=1&productId=463712&price=00.0&siteId=null&line1=Nike+Air+Max+1+Ultra+Moire&line2=Men%27s+Shoe&passcode=null&sizeType=null&skuAndSize=10661133%3A10&qty=1&rt=json&view=3&skuId=10661133&displaySize=14&_=142655682313'

br = mechanize.Browser()
response = br.open(store_url)
response = br.open(cart_url)
data = response.read()
print data

输出

^{pr2}$

相关问题 更多 >