Python:无法按属性搜索

def your_filter(tag, value): return any(tag[key] == value for key in tag.attrs.keys()) all_linked = soup.find_all("a", text=re.compile(r'summary compensation', re.IGNORECASE), href=True) if len(all_linked)>0: table_link = all_linked[0]['href'] tags = soup.find_all(lambda tag: your_filter(tag, table_link[1:])) goto_table = soup.find(tags[0].name, tags[0].attrs)

1条回答

网友

1楼 · 发布于 2024-09-30 14:29:01

DOM是相当“扁平”的，当您实际上需要在DOM的更高层次上，在父div的级别上，查找嵌套元素，然后查找具有目标表的同级div。一种方法可能是：

import requests
from bs4 import BeautifulSoup as bs
from pandas import read_html as rh

r = requests.get('https://www.sec.gov/Archives/edgar/data/72741/000104746918002070/a2234804zdef14a.htm', headers = {'User-Agent': 'Mozilla/5.0'})
soup = bs(r.content, 'lxml')
df = rh(str(soup.select_one('div:has(b:-soup-contains("SUMMARY COMPENSATION TABLE")) ~ div div > table')))[0]
df.dropna(how='all', axis = 1, inplace = True)
df.columns = df.iloc[1, :]
df = df.iloc[3:, :]
df.reset_index(drop=True, inplace = True)
df

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python:无法按属性搜索

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >