擅长:python、mysql、java
<p>如果您想使用regEx,可以使用以下在python3.5.2中运行的代码。
尝试打印您的“文本”以查看第1A项的实际值,该值与您在网页中看到的值(第160项1A项)不同。希望这有帮助。在</p>
<pre><code>import urllib.request
from urllib.error import URLError, HTTPError
import re
import contextlib
mainpage = "https://www.sec.gov/Archives/edgar/data/104169/000010416916000079/wmtform10-kx1312016.htm"
try:
with contextlib.closing(urllib.request.urlopen(mainpage)) as url:
htmltext = url.read().decode('utf-8')
#print(htmltext)
except HTTPError as e:
print("HTTPError")
except URLError as e:
print("URLError")
else:
results = re.findall(r'(?=ITEM\&\#160\;1A\.(.*)(RISK FACTORS))(.*)(?=ITEM\&\#160\;1B\.(.*)(UNRESOLVED))',htmltext)
print (results)
</code></pre>