使用多个标记美化组，每个标记都有一个特定的类

2条回答

网友

1楼 · 编辑于 2024-10-01 19:20:56

假设bsObj是你美丽的汤对象尝试：

tr = bsObj.findAll('tr', {'valign': 'top'})
td = tr.findAll('td', {'width': '40%'})

希望这有帮助。在

网友

2楼 · 编辑于 2024-10-01 19:20:56

您可以将re.compile对象与soup.find_all一起使用：

import re
from bs4 import BeautifulSoup as soup
html = """
  <table>
    <tr style='width:40%'>
      <td style='align:top'></td>
    </tr>
  </table>
"""
results = soup(html, 'html.parser').find_all(re.compile('td|tr'), {'style':re.compile('width:40%|align:top')})

输出：

^{pr2}$

通过提供re.compile对象来指定所需的标记和style值，find_all将返回tr或{}标记的任何实例，该标记包含width:40%或{}的内联style属性。在

此方法可以通过提供多个属性值来推断元素：

html = """
 <table>
   <tr style='width:40%'>
    <td style='align:top' class='get_this'></td>
    <td style='align:top' class='ignore_this'></td>
  </tr>
</table>
"""
results = soup(html, 'html.parser').find_all(re.compile('td|tr'), {'style':re.compile('width:40%|align:top'), 'class':'get_this'})

输出：

[<td class="get_this" style="align:top"></td>]

编辑2：简单递归解决方案：

import bs4
from bs4 import BeautifulSoup as soup
def get_tags(d, params):
  if any((lambda x:b in x if a == 'class' else b == x)(d.attrs.get(a, [])) for a, b in params.get(d.name, {}).items()):
     yield d
  for i in filter(lambda x:x != '\n' and not isinstance(x, bs4.element.NavigableString) , d.contents):
     yield from get_tags(i, params)

html = """
 <table>
  <tr style='align:top'>
    <td style='width:40%'></td>
    <td style='align:top' class='ignore_this'></td>
 </tr>
 </table>
"""
print(list(get_tags(soup(html, 'html.parser'), {'td':{'style':'width:40%'}, 'tr':{'style':'align:top'}})))

输出：

[<tr style="align:top">
  <td style="width:40%"></td>
  <td class="ignore_this" style="align:top"></td>
 </tr>, <td style="width:40%"></td>]

递归函数使您能够为某些标记提供自己的字典所需的目标属性：此解决方案尝试将任何指定属性与传递给函数的bs4对象相匹配，如果发现匹配，则元素为yielded

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用多个标记美化组，每个标记都有一个特定的类

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >