从“a”标记中查找href并没有找到第一个“a”标记,如何修复它?

2024-06-25 05:38:50 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手,我确实在努力学习。出于某种原因,每个职位都保存在'a'标记下,而不是div,div也包含href。 这是项目输出:打印(项目)

<a class="tapItem fs-unmask result job_e0fb3e5f520856c0 resultWithShelf sponTapItem tapItem-noPadding desktop" data-hide-spinner="true" data-jk="e0fb3e5f520856c0" data-mobtk="1favs1gn0t5v1800" href="/company/Acentury/jobs/New-Graduate-Software-Developer-e0fb3e5f520856c0?fccid=5c6453896b020232&amp;vjs=3" id="job_e0fb3e5f520856c0" rel="nofollow" target="_blank"><div class="slider_container"><div class="slider_list"><div class="slider_item"><div class="job_seen_beacon"><table cellpadding="0" cellspacing="0" class="jobCard_mainContent" role="presentation"><tbody><tr><td class="resultContent"><div class="heading4 color-text-primary singleLineTitle tapItem-gutter"><h2 class="jobTitle jobTitle-color-purple jobTitle-newJob"><div class="new topLeft holisticNewBlue desktop"><span class="label">new</span></div><span title="New Graduate Software Developer">New Graduate Software Developer</span></h2></div><div class="heading6 company_location tapItem-gutter"><pre><span class="companyName">Acentury</span><div class="companyLocation">Richmond Hill, ON<span class="remote-bullet">•</span><span>Temporarily Remote</span></div></pre></div><div class="heading6 tapItem-gutter metadataContainer"><div class="metadata salary-snippet-container"><span class="salary-snippet">$44,182 - $126,699 a year</span></div></div><div class="heading6 error-text tapItem-gutter"></div></td></tr></tbody></table><table class="jobCardShelfContainer" role="presentation"><tbody><tr class="jobCardShelf"><td class="shelfItem indeedApply"><span class="iaIcon"></span><span class="ialbl iaTextBlack">Easily apply</span></td></tr><tr class="underShelfFooter"><td><div class="heading6 tapItem-gutter result-footer"><div class="job-snippet"><ul style="list-style-type:circle;margin-top: 0px;margin-bottom: 0px;padding-left:20px;">
<li>Work with senior <b>developers</b> to develop front-end features on our current platform through entire R&amp;D cycle from design to implementation and official release.</li>
</ul></div><span class="date">Today</span><span class="result-link-bar-separator">·</span><button aria-expanded="false" class="sl resultLink more_links_button" type="button">More...</button></div><div class="tab-container"><div class="more-links-container result-tab" role="presentation"><div class="more_links"><button class="close-button" title="Close" type="button"></button><ul><li><span class="mat">View all <a href="/Acentury-jobs">Acentury jobs</a> - <a href="/jobs-in-Richmond-Hill,-ON">Richmond Hill jobs</a></span></li><li><span class="mat">Salary Search: <a href="/career/software-engineer/salaries/Richmond-Hill--ON?campaignid=serp-more&amp;fromjk=e0fb3e5f520856c0&amp;from=serp-more">New Graduate Software Developer salaries in Richmond Hill, ON</a></span></li></ul></div></div></div></td></tr></tbody></table><div aria-live="polite"></div></div></div><div class="slider_sub_item"></div></div></div><div class="kebabMenu"><button aria-expanded="false" aria-haspopup="true" aria-label="Job actions" class="kebabMenu-button"><svg fill="none" height="24" viewbox="0 0 24 24" width="24" xmlns="http://www.w3.org/2000/svg"><path d="M12 7C13.1 7 14 6.1 14 5C14 3.9 13.1 3 12 3C10.9 3 10 3.9 10 5C10 6.1 10.9 7 12 7ZM12 10C10.9 10 10 10.9 10 12C10 13.1 10.9 14 12 14C13.1 14 14 13.1 14 12C14 10.9 13.1 10 12 10ZM12 17C10.9 17 10 17.9 10 19C10 20.1 10.9 21 12 21C13.1 21 14 20.1 14 19C14 17.9 13.1 17 12 17Z" fill="#2d2d2d"></path></svg></button></div></a> 

我的代码是

divs = soup.find_all('a', class_ = 'tapItem')
for item in divs:
   for people in item.find_all('a'):
       print(people)   
       for ok in people.find_all('a', class_ = 'tapItem'):
           linkJob1 = ok.get('href')
   print(linkJob1)

人员不包含第一个“a”标记,但包含其他标记,如何修复此问题?多谢各位

网址:https://ca.indeed.com/jobs?q=software+developer&l=Toronto%2C+ON&start=0

预期结果是每个职位/卡片的href


Tags: divonjobsbuttonlitrclasstd
1条回答
网友
1楼 · 发布于 2024-06-25 05:38:50

您只需要一个id(作业id),如果您在元素级别使用类result进行循环,则可以从data-jk属性中提取该id。然后,您可以像网站一样动态构建url:

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://ca.indeed.com/jobs?q=software+developer&l=Toronto,+ON&start=0')
soup = bs(r.content, 'lxml')

for job in soup.select('.result'):
    print(job.select_one('.jobTitle').get_text(' '))
    print(f'https://ca.indeed.com/viewjob?jk={job["data-jk"]}')

相关问题 更多 >