如何筛选输出结果并设置范围?

2024-10-01 05:04:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做一个西班牙语-英语词典爬虫

我希望# Part of Speech只在<div id="dictionary-neodict-es">范围内获得结果

import requests
from bs4 import BeautifulSoup
from collections import OrderedDict

base_url = "https://www.spanishdict.com/translate/"
search_keyword = input("input the keyword : ")
url = base_url + search_keyword
spanishdict_r = requests.get(url)
spanishdict_soup = BeautifulSoup(spanishdict_r.text, 'html.parser')

# Phonetic Alphabet
print(spanishdict_soup.find("span", {"id": "dictionary-link-es"}).text)

# Part of Speech
part_of_speech = dict.fromkeys([x.text for x in spanishdict_soup.find_all("a", {"class": "href--2RDqa"})]).keys()
for part in part_of_speech:
    print(part)

# Meaning
print(spanishdict_soup.find("div", {"id": "quickdef1-es"}).text)

我之所以告诉你这一点,是因为SpanishDict.com网站上有三种词典定义

1. Curiosity Media Inc.
<div id="dictionary-neodict-es">

2. Harrap Publishers Limited
<div id="dictionary-neoharrap-es">

3. Collins Complete Spanish Electronic Dictionary © HarperCollins Publishers 2011
<div id="dictionary-collins-es">

您首先可以从© Curiosity Media Inc.中看到定义。 然后你也可以检查其他字典的定义。 所以我只想从© Curiosity Media Inc.收集项目

例如,deifinition of modelo

当我搜索modelo时,我的爬虫程序实际上会显示:

(moh-deh-loh)
masculine or feminine noun
masculine noun
adjective
Noun
model

masculine or feminine noun <-- from "dictionary-neodict-es" OK

masculine noun <-- from "dictionary-neodict-es" OK

adjective <-- from "dictionary-neodict-es" OK

Noun <-- collected from another dictionary's tag "dictionary-neoharrap-es", so it should not be displayed (or not to be parsed)

model <-- from "dictionary-neodict-es" OK

所以,我的爬虫应该给我的结果

(moh-deh-loh)
masculine or feminine noun
masculine noun
adjective
model

请帮我解决这个问题。 多谢各位


Tags: oroftextfromdividurldictionary
1条回答
网友
1楼 · 发布于 2024-10-01 05:04:04

您可以添加dictionary-neodict-es标记

然后在这个范围内找到你想要找到的

# add dictionary-neodict-es tag
dictionary_neodict_es = spanishdict_soup.find("div", {"id": "dictionary-neodict-es"})

# use dictionary_neodict_es to find you need
dictionary_link_es = dictionary_neodict_es.find("span", {"id": "dictionary-link-es"})
part_of_speech = dict.fromkeys([x.text for x in dictionary_neodict_es.find_all("a", {"class": "href 2RDqa"})]).keys()

相关问题 更多 >