擅长:python、mysql、java
<p>试试这个。首先,使用<code>BeautifulSoup</code>获取html。在html中查找所有<code>td</code>标记。然后,使用<code>regex</code>提取zipcode。你知道吗</p>
<pre><code>from bs4 import BeautifulSoup
import requests, re
url = "https://www.sec.gov/Archives/edgar/data/20/000095012310024631/c97665e10vk.htm"
page = requests.get(url)
soup = BeautifulSoup(page.content, "html.parser")
for s in soup.find_all("td", attrs={"align":"center"}):
zipcode = re.findall("(\d{5}-\d{4})",str(s)) # you can also use your regex if you want
if zipcode != []:
print (zipcode)
</code></pre>
<p>输出:</p>
<pre><code>['08071-0888']
</code></pre>