如何使用beautifulsoup解析表行中的两个字符串?

2024-09-29 21:35:12 发布

您现在位置:Python中文网/ 问答频道 /正文

html = '''
<div class="container">
 <h2>Countries & Capitals</h2>
  <table class="two-column td-red">
  <thead><tr><th>Country</th><th>Capital city</th></tr></thead><tbody>
   <tr class="grey"><td>Afghanistan</td><td>Kabul</td></tr>
   <tr><td>Albania</td><td>Tirana</td></tr>
</tbody>
</table>
</div>

有了这个HTML,我想具体解析国家名称和首都城市名称,并将它们放入字典中,这样我就可以

dict["Afghanistan] = 'Kabul'

我已经开始做了

soup = BeautifulSoup(open(filename), 'lxml')
countries = {}
# YOUR CODE HERE
table = soup.find_all('table')
for each in table:
    if each.find('tr'):
        continue
    else:
        print(each.prettify())
return countries

但因为是第一次使用,所以很混乱。你知道吗


Tags: div名称tableh2countriestrclasstd
3条回答

如果“tr”元素有两个“td”子元素,则可以选择它们,因为您有数据:

from bs4 import BeautifulSoup

html = """
<div class="container">
 <h2>Countries & Capitals</h2>
  <table class="two-column td-red">
  <thead><tr><th>Country</th><th>Capital city</th></tr></thead><tbody>
   <tr class="grey"><td>Afghanistan</td><td>Kabul</td></tr>
   <tr><td>Albania</td><td>Tirana</td></tr>
</tbody>
</table>
</div>
"""
soup = BeautifulSoup(html, 'lxml')
countries = {}

trs = soup.find_all('tr')
for tr in trs:
    tds = tr.find_all("td")
    if len (tds) ==2:
        countries[tds[0].text] = tds[1].text
print (countries)

输出:

{'Afghanistan': 'Kabul', 'Albania': 'Tirana'}

解决方案适用于给定的html示例:

from bs4 import BeautifulSoup  # assuming you did pip install bs4
soup = BeautifulSoup(html, "html.parser")  # the html you mentioned
table_data = soup.find('table')
data = {}  # {'country': 'capital'} dict
for row in table_data.find_all('tr'):
    row_data = row.find_all('td')
    if row_data:
        data[row_data[0].text] = row_data[1].text

我跳过了try, except块中的任何错误情况。我建议你去看看美丽之声,它涵盖了一切。你知道吗

这个怎么样:

from bs4 import BeautifulSoup

element ='''
<div class="container">
    <h2>Countries & Capitals</h2>
    <table class="two-column td-red">
        <thead>
            <tr><th>Country</th><th>Capital city</th></tr>
        </thead>
        <tbody>
            <tr class="grey"><td>Afghanistan</td><td>Kabul</td></tr>
            <tr><td>Albania</td><td>Tirana</td></tr>
        </tbody>
    </table>
</div>
'''
soup = BeautifulSoup(element, 'lxml')

countries = {}
for data in soup.select("tr"):
    elem = [item.text for item in data.select("th,td")]
    countries[elem[0]] = elem[1]

print(countries)

输出:

{'Afghanistan': 'Kabul', 'Country': 'Capital city', 'Albania': 'Tirana'}

相关问题 更多 >

    热门问题