用BeautifulSoup分析文本的方法

2024-06-25 06:30:12 发布

您现在位置:Python中文网/ 问答频道 /正文

目前,我试图解析HTML文本以便只保存2到3个元素。 我的代码是这样的:

#!/usr/bin/env python
# coding: utf8

from bs4 import BeautifulSoup

html_doc = """
<div class="postcodedata">
   <b>Latitude:</b> 51.19 degrees<br>
   <b>Longitude:</b> 0.07 degrees<br>
   <b>Postcode Town:</b> Tonbridge<br>
   <b>Easting:</b> 545102 degrees<br>
   <b>Northing:</b> 145533 degrees<br>
   <b>Grid Ref:</b> TQ451455<br>
   <b>District:</b> Sevenoaks<br>
   <b>Ward:</b> Edenbridge South and West<br>
   <b>Satnav:</b> TN8<br>
   <b><a href="phonecodes/"><u>STD Phone Code</u></a>:</b>
   (01959) xxxxxx
   <div class="clear"></div>
</div>
"""

soup = BeautifulSoup(html_doc,'html.parser')

for hit in soup.findAll(attrs={'class' : 'postcodedata'}):
    print hit.text

我想提取“城市邮政编码”、“卫星导航”和“标准电话代码”。在

我该如何处理这个提取?在


Tags: 代码文本brdiv元素docusrhtml
2条回答

简单方法,只需添加如何管理数据:

from bs4 import BeautifulSoup

html_doc = """
<div class="postcodedata">
   <b>Latitude:</b> 51.19 degrees<br>
   <b>Longitude:</b> 0.07 degrees<br>
   <b>Postcode Town:</b> Tonbridge<br>
   <b>Easting:</b> 545102 degrees<br>
   <b>Northing:</b> 145533 degrees<br>
   <b>Grid Ref:</b> TQ451455<br>
   <b>District:</b> Sevenoaks<br>
   <b>Ward:</b> Edenbridge South and West<br>
   <b>Satnav:</b> TN8<br>
   <b><a href="phonecodes/"><u>STD Phone Code</u></a>:</b>
   (01959) xxxxxx
   <div class="clear"></div>
</div>
"""

soup = BeautifulSoup(html_doc,'html.parser')

data = soup.find(attrs={'class' : 'postcodedata'})
#split line by line
values = data.text.split('\n');

for i in range(len(values)):
    #for each line split by semicolon so line[0] has the key and line[1] has the value
    line = values[i].split(':')
    #check the required key 
    if line[0]=='Postcode Town' or line[0]=='Satnav' or line[0] =='STD Phone Code':
         print line[1]

希望帮助了你!在

我找到了一个解决方案:

#!/usr/bin/env python
# coding: utf8

from bs4 import BeautifulSoup

html_doc = """
<div class="postcodedata">
   <b>Latitude:</b> 51.19 degrees<br>
   <b>Longitude:</b> 0.07 degrees<br>
   <b>Postcode Town:</b> Tonbridge<br>
   <b>Easting:</b> 545102 degrees<br>
   <b>Northing:</b> 145533 degrees<br>
   <b>Grid Ref:</b> TQ451455<br>
   <b>District:</b> Sevenoaks<br>
   <b>Ward:</b> Edenbridge South and West<br>
   <b>Satnav:</b> TN8<br>
   <b><a href="phonecodes/"><u>STD Phone Code</u></a>:</b>
   (01959) xxxxxx
   <div class="clear"></div>
</div>
"""

soup = BeautifulSoup(html_doc,'html.parser')
data = ""
for hit in soup.findAll(attrs={'class' : 'postcodedata'}):
    data = hit.text.strip()

rest = str(data)
print rest
print "*************************"
count = 1 
for line in rest.splitlines():
    if count == 3:
        town = (line).replace("Postcode Town:", "").strip()
        print town 
    if count == 9:
        satnav = (line).replace("Satnav:", "").strip()
        print satnav
    if count == 11:
        phonecodes = (line).strip()
        print phonecodes
    count += 1

相关问题 更多 >