<p>诀窍是用完全相同的头和cookie正确地模拟请求。我从开发者工具中获取了cookie原始字符串</p>
<p>以下是如何获取原始文本数据:</p>
<pre><code>import json
from http.cookies import SimpleCookie
from urllib.parse import urlencode
import requests
link = 'https://finra-markets.morningstar.com/bondSearch.jsp'
payload = {
'count': '20',
'sortfield': 'tradeDate',
'sorttype': '2',
'start': '0',
'searchtype': 'T',
'query': {"Keywords": [{"Name": "securityId", "Value": "C679131"},
{"Name": "tradeDate", "minValue": "10/03/2019", "maxValue": "10/03/2020"}]}
}
cookies_raw_data = "__cfduid=db2d21a652ef313fcff3704bd87e839401602408581; qs_wsid=1CBF0E77A1169ED03A3EB86A6A8A991D; __cfruid=0ef7fb90b47b06df86311ff32918c0c9c441617d-1602408582; SessionID=1CBF0E77A1169ED03A3EB86A6A8A991D; UsrID=41151; UsrName=FINRA.QSAPIDEF@morningstar.com; Instid=FINRA; msFinraHasAgreed=true"
cookie = SimpleCookie()
cookie.load(cookies_raw_data)
cookies = {}
for key, morsel in cookie.items():
cookies[key] = morsel.value
ref_payload = urlencode(dict(ticker="C679131", startdate="10/03/2019", enddate="10/03/2020"))
referer = f"https://finra-markets.morningstar.com/BondCenter/BondTradeActivitySearchResult.jsp?{ref_payload}"
headers = {
"Accept": "text/plain, */*; q=0.01",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Content-Length": "278",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Host": "finra-markets.morningstar.com",
"Origin": "https://finra-markets.morningstar.com",
"Referer": referer,
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}
response = requests.post(link, data=urlencode(payload), headers=headers, cookies=cookies).text
print(response.strip())
</code></pre>
<p>输出:</p>
<pre><code>{T:{"Columns":[{"tradeQuantity":"1125000","quantityAsString":"1125000","timeOfExecution":"11:46:02","settlementDate":"10/2/2020","tradeModifier":"_","secondModifier":"_","specialPriceIndicator":"-","asOfTrade":"-","reportingParty":"B","tradeStatus":"T","reportingPartyType":"D","contraPartyType":"C","securityId":"C679131","issueIdentifier":"EXC4479862","descriptionOfIssuer":"EXELON CORP","subproductType":"Corporate Bond","couponRate":3.497,"maturityDate":"06/01/2022","price":104.576,"yield":0.584,"tradeDate":"10/2/2020","symbol":null,"cusip":null,"callable":null,"commissionIndicator":"N","ATSIndicator":" ","remuneration":"N"},{"tradeQuantity":"60000","quantityAsString":"60000","timeOfExecution":"10:23:55","settlementDate":"10/5/2020","tradeModifier":"_","secondModifier":"_","specialPriceIndicator":"-","asOfTrade":"-","reportingParty":"S","tradeStatus":"T","reportingPartyType":"D",
and so on...
</code></pre>
<p>数据本身是一个纯文本,结果是无效的<code>JSON</code>。我不能马上解析它。经过几次尝试后,我意识到第一个键<code>T</code>不在<code>"</code>中,因此它不是作为有效的<code>JSON</code>传递的,但是。。。一个简单的黑客就成功了</p>
<p>要获取<code>JSON</code>对象,请使用以下命令(如果我找到一种不太老套的方法,我将编辑此命令):</p>
<pre><code>data = json.loads(response.strip()[3:-1])
for t in data['Columns']:
print(f"{t['descriptionOfIssuer']} - {t['tradeQuantity']} - {t['price']}")
</code></pre>
<p>输出:</p>
<pre><code>EXELON CORP - 1125000 - 104.576
EXELON CORP - 60000 - 104.642
EXELON CORP - 60000 - 104.618
EXELON CORP - 200000 - 104.612
EXELON CORP - 200000 - 104.612
EXELON CORP - 2900000 - 104.597
EXELON CORP - 20000 - 104.6
EXELON CORP - 225000 - 104.553
EXELON CORP - 64000 - 104.581
EXELON CORP - 64000 - 104.596
EXELON CORP - 50000 - 104.553
EXELON CORP - 2100000 - 104.634
EXELON CORP - 230000 - 104.551
EXELON CORP - 97000 - 104.566
EXELON CORP - 15000 - 104.551
EXELON CORP - 342000 - 104.582
EXELON CORP - 1400000 - 104.616
EXELON CORP - 200000 - 104.501
EXELON CORP - 200000 - 104.511
EXELON CORP - 220000 - 104.397
</code></pre>
<p>编辑:</p>
<p>为了证明即使是短期的(硬编码的)cookie也比根本没有数据好,这里有一个修改过的脚本版本,它为您要查找的股票代码生成一个数据转储</p>
<p>即使是那些该死的cookie,这也应该起作用,因为您请求的归档数据不太可能更改。因此,您可以获取它,保存它,然后继续</p>
<p>注意:如果我正在使用的cookie已经过时,只需将它们替换为<code>Developer Tool -> XHR -> bondSearch.jsp -> Headers -> Request Headers -> Cookie</code>中的任何值即可:</p>
<ul>
<li><code>__cfduid</code></li>
<li><code>qs_wsid</code></li>
<li><code>__cfruid</code></li>
<li><code>SessionID</code>(这始终与<code>qs_wsid</code>相同)</li>
</ul>
<p>守则:</p>
<pre><code>import json
import time
from urllib.parse import urlencode
import requests
ref_payload = urlencode(dict(ticker="C679131", startdate="10/03/2019", enddate="10/03/2020"))
referer = f"https://finra-markets.morningstar.com/BondCenter/BondTradeActivitySearchResult.jsp?{ref_payload}"
headers = {
"Accept": "text/plain, */*; q=0.01",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-US,en;q=0.9",
"Connection": "keep-alive",
"Content-Length": "278",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"Host": "finra-markets.morningstar.com",
"Origin": "https://finra-markets.morningstar.com",
"Referer": referer,
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}
cookies = {
"__cfduid": "d1820cb5f1d1e8ec40513d0f8326ce1881602492151",
"qs_wsid": "92CD4948C2AC7FCEC0989B34B86C1ADB",
"__cfruid": "4dec9a2deb6d70c86ee5b8fa4046748994ef6254-1602492151}",
"SessionID": "92CD4948C2AC7FCEC0989B34B86C1ADB",
"UsrID": "41151",
"UsrName": "FINRA.QSAPIDEF@morningstar.com",
"Instid": "FINRA",
"msFinraHasAgreed": "true",
}
start_counter = 0
final_output = []
while True:
payload = {
'count': '20',
'sortfield': 'tradeDate',
'sorttype': '2',
'start': str(start_counter),
'searchtype': 'T',
'query': {
"Keywords": [
{"Name": "securityId", "Value": "C679131"},
{"Name": "tradeDate", "minValue": "10/03/2019", "maxValue": "10/03/2020"},
]
}
}
response = requests.post(
'https://finra-markets.morningstar.com/bondSearch.jsp',
data=urlencode(payload),
headers=headers,
cookies=cookies,
).text
data = json.loads(response.strip()[3:-1])["Columns"]
if data:
print(f"Fetching data for counter {start_counter}...")
final_output.extend(data)
start_counter += 20
else:
break
with open(f"data_dump_securityID_C679131.json", "w") as d:
json.dump(final_output, d, indent=4, sort_keys=True)
</code></pre>