<p>试试这个:</p>
<pre><code>from bs4 import BeautifulSoup
import urllib.request # web access
import re
url = "https://wsc.nmbe.ch/family/87/Senoculidae"
page = urllib.request.urlopen(url) # conntect to website
try:
page = urllib.request.urlopen(url)
except:
print("Ups!")
soup = BeautifulSoup(page, 'html.parser')
#div = soup.find(text=True, recursive=)
regex = re.compile('^speciesTitle')
content_lis = soup.find_all('div', attrs={'class': regex})
file = ''
for cl in content_lis:
a = cl.select_one('div a strong i')
b = cl.find(text=True, recursive=False)
c = cl.select_one('span')
cc = re.findall("[\w]+", c.text)[0]
file += f'{a.get_text(strip=True)};{b.strip()};{cc}\n'
with open('file.csv', 'w') as f:
f.write(file)
</code></pre>
<p>使用以下内容保存文件:</p>
<pre><code>Senoculus albidus;(F. O. Pickard-Cambridge, 1897);Brazil
Senoculus barroanus;Chickering, 1941;Panama
Senoculus bucolicus;Chickering, 1941;Panama
Senoculus cambridgei;Mello-Leitão, 1927;Brazil
Senoculus canaliculatus;F. O. Pickard-Cambridge, 1902;Mexico
Senoculus carminatus;Mello-Leitão, 1927;Brazil
Senoculus darwini;(Holmberg, 1883);Argentina
Senoculus fimbriatus;Mello-Leitão, 1927;Brazil
Senoculus gracilis;(Keyserling, 1879);Guyana
Senoculus guianensis;Caporiacco, 1947;j
Senoculus iricolor;(Simon, 1880);Brazil
Senoculus maronicus;Taczanowski, 1872;French
</code></pre>
<p>等等</p>