Python Beautifulsoup查找正确的标记

<div class="meaning"><span class="hinshi">［名］</span><span class="hinshi">(スル)</span></div>, <div class="meaning"><b>１</b> 今まで経験してきた仕事・身分・地位・学業などの事柄。履歴。「―を偽る」</div>,

3条回答

网友

1楼 · 编辑于 2024-09-30 18:13:49

你可以这样做

for s in soup.findAll("div {class:meaning}"):
    for b in s.findAll("b"):
     #   b.getText("<b>")

在“#”行中，您应该根据结果来修复它。你知道吗

网友

2楼 · 编辑于 2024-09-30 18:13:49

您可以尝试使用.select函数，该函数采用CSS选择器：

soup.select('.meaning b')

网友

3楼 · 编辑于 2024-09-30 18:13:49

您可以使用find方法的关键字参数来查找特定属性。在您的例子中，您需要匹配class_关键字。关于class_关键字，请参见documentation。你知道吗

假设您希望筛选不包含“hinshi”类的任何子级的元素，可以尝试以下操作：

soup = BeautifulSoup(data)
potential_matches = soup.find_all(class_="meaning")

matches = []
for match in potential_matches:
  bad_children = match.find_all(class_="hinshi")
  if not bad_children:
    matches.append(match)

return matches

如果您愿意，您可以将其缩短一点，例如：

matches = soup.find_all(class_="meaning")
return [x for x in matches if not x.find_all(class_="hinshi")]

或者，根据您的Python版本，即2.x：

matches = soup.find_all(class_="meaning")
return filter(matches, lambda x: not x.find_all(class_="hinshi"))

编辑：如果要在示例中查找数字旁边的外来字符，应首先删除b元素，然后使用get_text方法。例如

# Assuming `element` is one of the matches from above
element.find('b').extract()
print(element.get_text())

相关问题更多 >

编程相关推荐

热门问题

热门文章