擅长:python、mysql、java
<p>这是使用请求模块从该网页获取数据的更快方法之一,因为数据已经在脚本标记内的页面源中。您现在要做的就是在将数据存储到dataframe之前清理数据</p>
<pre><code>import re
import requests
URL = 'http://tickertrak.com/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
r = s.get(URL)
items = re.findall(r"var arrayFromPHP = \[(.*?)\];",r.text)[0]
trs = re.findall(r"\[(.*?)\]",items)
for tds in trs:
print(tds)
</code></pre>
<p>输出如下:</p>
<pre><code>"Options","gamestop corp","gme","1","58662","131","-80","-85","1"
"Options","amc entertainment holdings inc","AMC","1","16290","36","-79","-66","2"
"Options","nokia corp","nok","1","3568","14","-86","-88","3"
"Options","regal-beloit corp","RBC","1","3254","11","-56","-89","4"
"Options","blackberry ltd","BB","1","3002","10","-91","-92","5"
</code></pre>