<p>这里有一种使用js2xml的方法:</p>
<p>首先,获取您感兴趣的JavaScript代码:</p>
<pre><code>$ scrapy shell http://sports.qq.com/a/20170802/002470.htm
2017-08-04 18:41:23 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot)
(...)
2017-08-04 18:41:26 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://sports.qq.com/a/20170802/002470.htm> (referer: None)
>>> js = response.xpath('//script/text()').get()
>>> print(js)
ARTICLE_INFO = window.ARTICLE_INFO || {
site:'sports',
site_cname:'体育',
site_url:'http://sports.qq.com',
title:'球爹喊话詹皇:想拿更多冠军 那就和我儿子搭档 ',
id:'20170802002470',
pubtime:'2017-08-02 06:22',
type:'2',
article_url:'http://sports.qq.com/a/20170802/002470.htm',
sosokeys:{key1:'NBA',key2:'湖人',key3:'球爹',key4:'詹姆斯'},
tags:['NBA','湖人','球爹','詹姆斯'],
catalog:'basket',
catalog_full:'sports-basket-nba',
sub_nav:'nba',
topic:{name:'',cname:'',ztcatalog:''},
subName:{name:'basket',url:'http://sports.qq.com/nba/', cname:'篮球'},
isShowLastAD:'',
tpl:{dev:'nba',ver:'1.0.0.0',time:'20150512',type:'1',stype:''}
}
</code></pre>
<p>然后,将其发送给<code>js2xml.parse()</code>以获得一个解析树:</p>
^{pr2}$
<p>您可以检查使用<code>js2xml.pretty_print()</code>解析的js2xml:</p>
<pre><code>>>> print(js2xml.pretty_print(tree))
<program>
<assign operator="=">
<left>
<identifier name="ARTICLE_INFO"/>
</left>
<right>
<binaryoperation operation="||">
<left>
<dotaccessor>
<object>
<identifier name="window"/>
</object>
<property>
<identifier name="ARTICLE_INFO"/>
</property>
</dotaccessor>
</left>
<right>
<object>
<property name="site">
<string>sports</string>
</property>
<property name="site_cname">
<string>体育</string>
</property>
<property name="site_url">
<string>http://sports.qq.com</string>
</property>
<property name="title">
<string>球爹喊话詹皇:想拿更多冠军 那就和我儿子搭档 </string>
</property>
<property name="id">
<string>20170802002470</string>
</property>
<property name="pubtime">
<string>2017-08-02 06:22</string>
</property>
<property name="type">
<string>2</string>
</property>
<property name="article_url">
<string>http://sports.qq.com/a/20170802/002470.htm</string>
</property>
<property name="sosokeys">
<object>
<property name="key1">
<string>NBA</string>
</property>
<property name="key2">
<string>湖人</string>
</property>
<property name="key3">
<string>球爹</string>
</property>
<property name="key4">
<string>詹姆斯</string>
</property>
</object>
</property>
<property name="tags">
<array>
<string>NBA</string>
<string>湖人</string>
<string>球爹</string>
<string>詹姆斯</string>
</array>
</property>
<property name="catalog">
<string>basket</string>
</property>
<property name="catalog_full">
<string>sports-basket-nba</string>
</property>
<property name="sub_nav">
<string>nba</string>
</property>
<property name="topic">
<object>
<property name="name">
<string></string>
</property>
<property name="cname">
<string></string>
</property>
<property name="ztcatalog">
<string></string>
</property>
</object>
</property>
<property name="subName">
<object>
<property name="name">
<string>basket</string>
</property>
<property name="url">
<string>http://sports.qq.com/nba/</string>
</property>
<property name="cname">
<string>篮球</string>
</property>
</object>
</property>
<property name="isShowLastAD">
<string></string>
</property>
<property name="tpl">
<object>
<property name="dev">
<string>nba</string>
</property>
<property name="ver">
<string>1.0.0.0</string>
</property>
<property name="time">
<string>20150512</string>
</property>
<property name="type">
<string>1</string>
</property>
<property name="stype">
<string></string>
</property>
</object>
</property>
</object>
</right>
</binaryoperation>
</right>
</assign>
</program>
</code></pre>
<p>您需要的数据是<code>||</code>二进制运算的<code>right</code>操作数。可以在解析树上使用XPath来获取它:</p>
<pre><code>>>> o = tree.xpath('//binaryoperation/right/object')[0]
>>> o
<Element object at 0x7f6c8c7967e8>
</code></pre>
<p><code>js2xml.utils.objects.make</code>用于根据以下内容构建Python对象:</p>
<pre><code>>>> from pprint import pprint
>>> pprint(data)
{'article_url': 'http://sports.qq.com/a/20170802/002470.htm',
'catalog': 'basket',
'catalog_full': 'sports-basket-nba',
'id': '20170802002470',
'isShowLastAD': '',
'pubtime': '2017-08-02 06:22',
'site': 'sports',
'site_cname': '体育',
'site_url': 'http://sports.qq.com',
'sosokeys': {'key1': 'NBA', 'key2': '湖人', 'key3': '球爹', 'key4': '詹姆斯'},
'subName': {'cname': '篮球',
'name': 'basket',
'url': 'http://sports.qq.com/nba/'},
'sub_nav': 'nba',
'tags': ['NBA', '湖人', '球爹', '詹姆斯'],
'title': '球爹喊话詹皇:想拿更多冠军 那就和我儿子搭档 ',
'topic': {'cname': '', 'name': '', 'ztcatalog': ''},
'tpl': {'dev': 'nba',
'stype': '',
'time': '20150512',
'type': '1',
'ver': '1.0.0.0'},
'type': '2'}
>>>
</code></pre>
<p>正如@Granitosaurus所提到的,对于这样一个任务来说,这似乎有点“过分”,但是当JSON数据不是100%JSON时(例如使用单引号),它可能会很有用</p>