如何使用beautifulsoup解析表行中的两个字符串？

html = ''' <div class="container"> <h2>Countries & Capitals</h2> <table class="two-column td-red"> <thead><tr><th>Country</th><th>Capital city</th></tr></thead><tbody> <tr class="grey"><td>Afghanistan</td><td>Kabul</td></tr> <tr><td>Albania</td><td>Tirana</td></tr> </tbody> </table> </div>

soup = BeautifulSoup(open(filename), 'lxml') countries = {} # YOUR CODE HERE table = soup.find_all('table') for each in table: if each.find('tr'): continue else: print(each.prettify()) return countries

3条回答

网友

1楼 · 编辑于 2024-09-29 21:35:12

如果“tr”元素有两个“td”子元素，则可以选择它们，因为您有数据：

from bs4 import BeautifulSoup

html = """
<div class="container">
 <h2>Countries & Capitals</h2>
  <table class="two-column td-red">
  <thead><tr><th>Country</th><th>Capital city</th></tr></thead><tbody>
   <tr class="grey"><td>Afghanistan</td><td>Kabul</td></tr>
   <tr><td>Albania</td><td>Tirana</td></tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(html, 'lxml')
countries = {}

trs = soup.find_all('tr')
for tr in trs:
    tds = tr.find_all("td")
    if len (tds) ==2:
        countries[tds[0].text] = tds[1].text
print (countries)

输出：

{'Afghanistan': 'Kabul', 'Albania': 'Tirana'}

网友

2楼 · 编辑于 2024-09-29 21:35:12

解决方案适用于给定的html示例：

from bs4 import BeautifulSoup  # assuming you did pip install bs4
soup = BeautifulSoup(html, "html.parser")  # the html you mentioned
table_data = soup.find('table')
data = {}  # {'country': 'capital'} dict
for row in table_data.find_all('tr'):
    row_data = row.find_all('td')
    if row_data:
        data[row_data[0].text] = row_data[1].text

我跳过了try, except块中的任何错误情况。我建议你去看看美丽之声，它涵盖了一切。你知道吗

网友

3楼 · 编辑于 2024-09-29 21:35:12

这个怎么样：

from bs4 import BeautifulSoup

element ='''
<div class="container">
    <h2>Countries & Capitals</h2>
    <table class="two-column td-red">
        <thead>
            <tr><th>Country</th><th>Capital city</th></tr>
        </thead>
        <tbody>
            <tr class="grey"><td>Afghanistan</td><td>Kabul</td></tr>
            <tr><td>Albania</td><td>Tirana</td></tr>
        </tbody>
    </table>
</div>
'''
soup = BeautifulSoup(element, 'lxml')

countries = {}
for data in soup.select("tr"):
    elem = [item.text for item in data.select("th,td")]
    countries[elem[0]] = elem[1]

print(countries)

输出：

{'Afghanistan': 'Kabul', 'Country': 'Capital city', 'Albania': 'Tirana'}

相关问题更多 >

编程相关推荐

热门问题

热门文章