擅长:python、mysql、java
<p>您的网页只包含文件夹,作为一个人,您必须单击这些文件夹才能获取文件</p>
<p>使用BS,您必须获取文件夹的链接,然后请求它们获取文件列表</p>
<p>简化您的案例的是,文件夹和文件都具有类属性DocumentBrowserNameLink</p>
<p>您可以使用一个函数来查找它们</p>
<pre><code>from bs4 import BeautifulSoup as bs
import requests
DOMAIN = 'https://lfportal.loudoun.gov/LFPortalinternet/'
URL = 'https://lfportal.loudoun.gov/LFPortalinternet/Browse.aspx?startid=213973&row=1&dbid=0'
FILETYPE = '.xls'
def get_soup(url):
return bs(requests.get(url).text, 'html.parser')
def get_links(page):
result = page.find_all(class_="DocumentBrowserNameLink")
return result
page = get_soup(URL)
folder_links = get_links(page)
for link in folder_links:
page2 = get_soup(DOMAIN + link['href'])
file_links = get_links(page2)
for file in file_links:
filepath = file['href']
if FILETYPE in filepath:
print(DOMAIN + filepath)
</code></pre>