Python selenium获取文本和href

2024-09-28 22:25:24 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有多个divs看起来像这样:

<div class="class1">
    <div class="class2">
        <div class="class3">Text1</div>
        <div class="class4">
            <a href="https://somelink"><h2>Text2</h2></a>
            <p class="class5">Text3 <span class="class6"> Text4 </span></p>
        </div>
    </div>
</div>

对于每个div,我可以得到Text1、Text2、Text3和Text4:

elements = driver.find_elements_by_xpath("//div[@class='class1']/*")
for e in elements:
    print(e.text)
    print('------------------------------------------')

但是我如何额外获得href的值呢

我希望:https://somelink,Text1,Text2,Text3,Text4


Tags: httpsdivh2elementsclasshrefspanprint
3条回答

我想你会在这里找到答案:Python Selenium - get href value

基本上,它看起来像

driver.find_elements_by_css_selector('div.class4 > a').get_attribute('href')

试着这样做:

elements = driver.find_elements_by_xpath("//div[@class='class1']/*") # this will recognize "class2" 
for e in elements:
    print(e.text)
    link = e.find_element_by_xpath(".//a").get_attribute("href") # Finds the "a" tag inside the class2. A "." at the beginning because we are finding element within elements. "//a" because "class2" is nested. 
    print('                     ')

为什么不这样做呢

elements = driver.find_elements_by_xpath("//div[@class='class1']/*")
res = []
for e in elements:
    res.append(e.text)
    href = e.get_attribute('href')
    if href is not None:
        res.insert(0, href)
print(", ".join(res))

相关问题 更多 >