我正在做一个西班牙语-英语词典爬虫
我希望# Part of Speech
只在<div id="dictionary-neodict-es">
范围内获得结果
import requests
from bs4 import BeautifulSoup
from collections import OrderedDict
base_url = "https://www.spanishdict.com/translate/"
search_keyword = input("input the keyword : ")
url = base_url + search_keyword
spanishdict_r = requests.get(url)
spanishdict_soup = BeautifulSoup(spanishdict_r.text, 'html.parser')
# Phonetic Alphabet
print(spanishdict_soup.find("span", {"id": "dictionary-link-es"}).text)
# Part of Speech
part_of_speech = dict.fromkeys([x.text for x in spanishdict_soup.find_all("a", {"class": "href--2RDqa"})]).keys()
for part in part_of_speech:
print(part)
# Meaning
print(spanishdict_soup.find("div", {"id": "quickdef1-es"}).text)
我之所以告诉你这一点,是因为SpanishDict.com网站上有三种词典定义
1. Curiosity Media Inc.
<div id="dictionary-neodict-es">
2. Harrap Publishers Limited
<div id="dictionary-neoharrap-es">
3. Collins Complete Spanish Electronic Dictionary © HarperCollins Publishers 2011
<div id="dictionary-collins-es">
您首先可以从© Curiosity Media Inc.
中看到定义。
然后你也可以检查其他字典的定义。
所以我只想从© Curiosity Media Inc.
收集项目
当我搜索modelo
时,我的爬虫程序实际上会显示:
(moh-deh-loh)
masculine or feminine noun
masculine noun
adjective
Noun
model
masculine or feminine noun <-- from "dictionary-neodict-es" OK
masculine noun <-- from "dictionary-neodict-es" OK
adjective <-- from "dictionary-neodict-es" OK
Noun <-- collected from another dictionary's tag "dictionary-neoharrap-es", so it should not be displayed (or not to be parsed)
model <-- from "dictionary-neodict-es" OK
所以,我的爬虫应该给我的结果
(moh-deh-loh)
masculine or feminine noun
masculine noun
adjective
model
请帮我解决这个问题。 多谢各位
您可以添加
dictionary-neodict-es
标记然后在这个范围内找到你想要找到的
相关问题 更多 >
编程相关推荐