垃圾解析javascrip

2024-10-01 13:32:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我在页面上有一个javascript,如下所示:

new Shopify.OptionSelectors("product-select", { product: {"id":185310341,"title":"10. Design | Siyah \u0026 beyaz kalpli",

我想要“185310341”。我在谷歌上搜索了几个小时,但是没有找到任何东西,希望你能帮助我。我怎样才能获取javascript和id?在

我试过密码:

^{pr2}$

但我得到了:

exceptions.AttributeError: 'Selector' object has no attribute 'search'

Tags: idnewtitle页面javascriptproductselectdesign
2条回答

Scrapy选择器对正则表达式有built-in support

sel.xpath('<xpath_to_find_the_element_text>').re(r'"id":(\d+)')

演示此特定正则表达式的工作:

^{pr2}$

regex方法的另一种方法是使用Javascript解析器,将解析器的输出转换为XML文档,然后使用XPath进行解析。在

这就是在js2xml中实现的,它使用^{}和{} (免责声明:我编写了js2xml;警告:不稳定)

在您的示例中,请使用js2xml.jsonlike.getall()检查这个示例scrpy shell会话:

paul:~$ scrapy shell http://2loom.com/products/2loom-design-siyah-beyaz-kalpli
2014-05-19 16:12:00+0200 [scrapy] INFO: Scrapy 0.23.0 started (bot: scrapybot)
2014-05-19 16:12:00+0200 [scrapy] INFO: Optional features available: ssl, http11
2014-05-19 16:12:00+0200 [scrapy] INFO: Overridden settings: {'LOGSTATS_INTERVAL': 0}
2014-05-19 16:12:00+0200 [scrapy] INFO: Enabled extensions: TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2014-05-19 16:12:00+0200 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2014-05-19 16:12:00+0200 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2014-05-19 16:12:00+0200 [scrapy] INFO: Enabled item pipelines: 
2014-05-19 16:12:00+0200 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2014-05-19 16:12:00+0200 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2014-05-19 16:12:00+0200 [default] INFO: Spider opened
2014-05-19 16:12:01+0200 [default] DEBUG: Crawled (200) <GET http://2loom.com/products/2loom-design-siyah-beyaz-kalpli> (referer: None)
[s] Available Scrapy objects:
[s]   crawler    <scrapy.crawler.Crawler object at 0x7f8552946610>
[s]   item       {}
[s]   request    <GET http://2loom.com/products/2loom-design-siyah-beyaz-kalpli>
[s]   response   <200 http://2loom.com/products/2loom-design-siyah-beyaz-kalpli>
[s]   settings   <CrawlerSettings module=None>
[s]   spider     <Spider 'default' at 0x7f8552384b90>
[s] Useful shortcuts:
[s]   shelp()           Shell help (print this help)
[s]   fetch(req_or_url) Fetch request (or URL) and update local objects
[s]   view(response)    View response in a browser
/usr/local/lib/python2.7/dist-packages/IPython/frontend.py:30: UserWarning: The top-level `frontend` package has been deprecated. All its subpackages have been moved to the top `IPython` level.
  warn("The top-level `frontend` package has been deprecated. "

In [1]: scripts = response.selector.xpath('//script/text()').extract()

In [2]: import js2xml, js2xml.jsonlike

In [3]: js = js2xml.parse(scripts[-1])

In [4]: js2xml.jsonlike.getall(js)
Out[4]: 
[{'onVariantSelected': 'selectCallback',
  'product': {'available': True,
   'compare_at_price': None,
   'compare_at_price_max': 0,
   'compare_at_price_min': 0,
   'compare_at_price_varies': False,
   'content': u'<blockquote>Siyah-beyaz kalpli tulumlarimiz 100% polyester olup kap\u015fonun i\xe7i ve ribanas\u0131 lacivertir. Fermuar\u0131 iki tarafl\u0131 a\xe7\u0131l\u0131r kapan\u0131r olup kap\u015fonun tamam\u0131n\u0131 kapsar ve beyaz renklidir. Tulumlar\u0131n her iki taraf\u0131ndaki cepler\xa0 beyaz fermuarl\u0131 ve elcikler siyaht\u0131r. Ayr\u0131ca kar\u0131n bolgesinde cepler vard\u0131r Tulumlardaki logolar beyazd\u0131r. Kad\u0131nlar ve erkekler i\xe7in tasarlanm\u0131\u015ft\u0131r.</blockquote>',
   'created_at': '2013-11-29T13:37:11+02:00',
   'description': u'<blockquote>Siyah-beyaz kalpli tulumlarimiz 100% polyester olup kap\u015fonun i\xe7i ve ribanas\u0131 lacivertir. Fermuar\u0131 iki tarafl\u0131 a\xe7\u0131l\u0131r kapan\u0131r olup kap\u015fonun tamam\u0131n\u0131 kapsar ve beyaz renklidir. Tulumlar\u0131n her iki taraf\u0131ndaki cepler\xa0 beyaz fermuarl\u0131 ve elcikler siyaht\u0131r. Ayr\u0131ca kar\u0131n bolgesinde cepler vard\u0131r Tulumlardaki logolar beyazd\u0131r. Kad\u0131nlar ve erkekler i\xe7in tasarlanm\u0131\u015ft\u0131r.</blockquote>',
   'featured_image': '//cdn.shopify.com/s/files/1/0305/9953/products/11._Zwarte_hartjes_vk_girls.jpg?v=1389259261',
   'handle': '2loom-design-siyah-beyaz-kalpli',
   'id': 185310341,
   'images': ['//cdn.shopify.com/s/files/1/0305/9953/products/11._Zwarte_hartjes_vk_girls.jpg?v=1389259261',
    '//cdn.shopify.com/s/files/1/0305/9953/products/6._Zwarte_hartjes_ak_girls.jpg?v=1389259259',
    '//cdn.shopify.com/s/files/1/0305/9953/products/11._Zwarte_hartjes_vk_boys.jpg?v=1389259264',
    '//cdn.shopify.com/s/files/1/0305/9953/products/6._Zwartje_hartjes_ak_boys.jpg?v=1389259264'],
   'options': ['Size'],
   'price': 15900,
   'price_max': 15900,
   'price_min': 15900,
   'price_varies': False,
   'published_at': '2013-11-29T13:34:20+02:00',
   'tags': [u'2\xb7Loom',
    'Beyaz',
    'Design',
    'Ekrek',
    u'Kad\u0131n',
    'Kalpli',
    'Lacivert'],
   'title': '10. Design | Siyah & beyaz kalpli',
   'type': '2 Loom Limiteds',
   'variants': [{'available': True,
     'barcode': None,
     'compare_at_price': None,
     'id': 424584985,
     'inventory_management': 'shopify',
     'inventory_policy': 'deny',
     'inventory_quantity': 3,
     'option1': 'XS (34-36: 1.60m-1.70m)',
     'option2': None,
     'option3': None,
     'options': ['XS (34-36: 1.60m-1.70m)'],
     'price': 15900,
     'requires_shipping': True,
     'sku': 'T01-BLWH-1-XS',
     'taxable': True,
     'title': 'XS (34-36: 1.60m-1.70m)',
     'weight': 0},
    {'available': True,
     'barcode': None,
     'compare_at_price': None,
     'id': 424584989,
     'inventory_management': 'shopify',
     'inventory_policy': 'deny',
     'inventory_quantity': 3,
     'option1': 'S (36-38: 1.65m-1.75m)',
     'option2': None,
     'option3': None,
     'options': ['S (36-38: 1.65m-1.75m)'],
     'price': 15900,
     'requires_shipping': True,
     'sku': 'T01-BLWH-1-S',
     'taxable': True,
     'title': 'S (36-38: 1.65m-1.75m)',
     'weight': 0},
    {'available': True,
     'barcode': None,
     'compare_at_price': None,
     'id': 424584997,
     'inventory_management': 'shopify',
     'inventory_policy': 'deny',
     'inventory_quantity': 7,
     'option1': 'M (38-40: 1.70m-1.80m)',
     'option2': None,
     'option3': None,
     'options': ['M (38-40: 1.70m-1.80m)'],
     'price': 15900,
     'requires_shipping': True,
     'sku': 'T01-BLWH-1-M',
     'taxable': True,
     'title': 'M (38-40: 1.70m-1.80m)',
     'weight': 0},
    {'available': True,
     'barcode': None,
     'compare_at_price': None,
     'id': 424585001,
     'inventory_management': 'shopify',
     'inventory_policy': 'deny',
     'inventory_quantity': 7,
     'option1': 'L (40-42: 1.75m-1.85m)',
     'option2': None,
     'option3': None,
     'options': ['L (40-42: 1.75m-1.85m)'],
     'price': 15900,
     'requires_shipping': True,
     'sku': 'T01-BLWH-1-L',
     'taxable': True,
     'title': 'L (40-42: 1.75m-1.85m)',
     'weight': 0}],
   'vendor': u'2\xb7Loom'}}]

In [5]: 

相关问题 更多 >