如何为每个<li class=”“><a>找到最近的上述同级<li>?

2024-10-16 17:24:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我的例子是html

<li><h4>A0: Pronouns</h4></li>
<li class="">
    <a>bb</a>
    <a>cc</a>
</li>
<li class="">
    <a>dd</a>
    <a>ee</a>
</li>
<li><h4>A0: Verbs Tenses & Conjugation</h4></li>
<li class="">
    <a>ff</a>
    <a>gg</a>
</li>
<li class="">
    <a>hh</a>
    <a>kk</a>
</li>
<li class="">
    <a>jj</a>
    <a>ii</a>
</li>

对于每个元素<li class=""><a>,我想找到其最近的上面的兄弟元素<li><h4>。比如说,

  • <li class=""><a>bb</a></li>对应于<li><h4>A0: Pronouns</h4></li>

  • <li class=""><a>dd</a></li>对应于<li><h4>A0: Pronouns</h4></li>

  • <li class="">ff<a>dd</a></li>对应于<li><h4>A0: Verbs Tenses & Conjugation</h4></li>

  • <li class="">hh<a>dd</a></li>对应于<li><h4>A0: Verbs Tenses & Conjugation</h4></li>

  • <li class="">jj<a>dd</a></li>对应于<li><h4>A0: Verbs Tenses & Conjugation</h4></li>

你能详细说明一下怎么做吗

import requests
from bs4 import BeautifulSoup

session = requests.Session()
headers = {
    'Accept-Encoding': 'gzip, deflate, sdch',
    'Accept-Language': 'en-US,en;q=0.8',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
}

link = 'https://french.kwiziq.com/revision/grammar'
r = session.get(link, headers = headers)
soup = BeautifulSoup(r.content, 'html.parser')

for d in soup.select('.callout-body > ul li > a:nth-of-type(1)'):
    print(d)

2条回答

您可以使用.find_previous('h4')

import requests
from bs4 import BeautifulSoup


url = "https://french.kwiziq.com/revision/grammar"

soup = BeautifulSoup(requests.get(url).content, "html.parser")
for a in soup.select(".callout  li > a:nth-of-type(1)"):
    print(
        "{:<70} {}".format(
            a.get_text(strip=True), a.find_previous("h4").get_text(strip=True)
        )
    )

印刷品:

Saying your name: Je m'appelle, Tu t'appelles, Vous vous appelez       A0: Pronouns
Tu and vous are used for three types of you                            A0: Pronouns
Je becomes j' with verbs beginning with a vowel (elision)              A0: Verbs Tenses & Conjugation
J'habite à [city] = I live in [city]                                   A0: Idioms, Idiomatic Usage, and Structures
Je viens de + [city] = I'm from + [city]                               A0: Idioms, Idiomatic Usage, and Structures
Conjugate être (je suis, tu es, vous êtes) in Le Présent (present tense) A0: Verbs Tenses & Conjugation
Make most adjectives feminine by adding -e                             A0: Adjectives & Adverbs
Nationalities differ depending on whether you're a man or a woman (adjectives) A0: Adjectives & Adverbs
Conjugate avoir (j'ai, tu as, vous avez) in Le Présent (present tense) A0: Verbs Tenses & Conjugation
Using un, une to say "a" (indefinite articles)                         A0: Nouns & Articles

...

French vocabulary and grammar lists by theme                           C1: Idioms, Idiomatic Usage, and Structures
French Fill-in-the-Blanks Tests                                        C1: Idioms, Idiomatic Usage, and Structures

您可以在CSS路径中使用:is

from bs4 import BeautifulSoup as soup
from collections import defaultdict
d, l = defaultdict(list), None
for i in soup1.select('li > :is(a, h4):nth-of-type(1)'):
   if i.name == 'h4':
      l = i.get_text(strip=True)
   else:
      d[l].append(i.get_text(strip=True))

print(dict(d))

输出:

{'A0: Pronouns': ['bb', 'dd'], 'A0: Verbs Tenses & Conjugation': ['ff', 'hh', 'jj']}

输出存储与语法部分关联的每个li的第一个a。如果您只想在组件结果中使用1-1部分,则可以使用字典理解:

new_d = {a:b for a, (b, *_) in d.items()}

输出:

{'A0: Pronouns': 'bb', 'A0: Verbs Tenses & Conjugation': 'ff'}

相关问题 更多 >