如何在换行前后获取字符串<br>

2024-07-01 08:00:24 发布

您现在位置:Python中文网/ 问答频道 /正文

html看起来像

<a>   
   "1447 Acres &nbsp; Council, Adams County, ID"
    <br>
    "1,190,000" 
</a>

我怎样才能单独获得1447英亩的土地

“亚当斯县议会,ID”和“1190000”


Tags: bridhtml土地countynbspadamsacres
3条回答

soup.text给出带有原始\n的文本,您可以使用split('\n')来拆分它,但有许多\n,它可能给出空元素

但是BeautifulSoup还有方法get_text(),它可以得到参数separator=strip=,它们可以这样使用

text = soup.get_text(separator='|', strip=True)

这就给了弦

"1447 Acres   Council, Adams County, ID"|"1,190,000"

现在您可以使用strip('|')将其拆分为列表

['"1447 Acres \xa0 Council, Adams County, ID"', '"1,190,000"']

我还要添加replace()以删除"

from bs4 import BeautifulSoup as BS

text = '''<a>     
   "1447 Acres &nbsp; Council, Adams County, ID"
    <br>
    "1,190,000" 
</a>'''

soup = BS(text, 'html.parser')

text = soup.get_text(separator='|', strip=True)
text = text.replace('"', '')

data = text.split('|')
print(data)

结果

['1447 Acres \xa0 Council, Adams County, ID', '1,190,000']

它还需要一些函数(可能在urllib)来将像&nbsp;这样的实体转换为正确的字符,或者您可以使用replace('\xa0', '')删除它

from bs4 import BeautifulSoup 

html_text = '<a>   "1447 Acres &nbsp; Council, Adams County, ID" <br> 
              "1,190,000" </a>'
soup = BeautifulSoup(html_text, "html.parser")
print(soup.text)

根据您的评论,我理解您希望将每个字符串保存到不同的变量。您可以尝试以下方法:

import re
from bs4 import BeautifulSoup

html_doc = """<a>   
   "1447 Acres &nbsp; Council, Adams County, ID"
    <br>
    "1,190,000" 
</a>"""

soup = BeautifulSoup(html_doc, "html.parser")

a_tag = soup.find("a").get_text(strip=True)

a_tag = a_tag.replace(u"\xa0", "").replace('"', " ").strip()

# Split either on a double space or on a comma - which is not a digit
acres, council, location, id_, price = re.split(r"\s{2}|,[^0-9]", a_tag)

print(acres)
print(council)
print(location)
print(id_)
print(price)

输出:

1447 Acres
Council
Adams County
ID
1,190,000

相关问题 更多 >

    热门问题