如何在换行前后获取字符串<br>

网友

1楼 · 编辑于 2024-07-01 08:00:24

soup.text给出带有原始\n的文本，您可以使用split('\n')来拆分它，但有许多\n，它可能给出空元素

但是BeautifulSoup还有方法get_text()，它可以得到参数separator=和strip=，它们可以这样使用

text = soup.get_text(separator='|', strip=True)

这就给了弦

"1447 Acres   Council, Adams County, ID"|"1,190,000"

现在您可以使用strip('|')将其拆分为列表

['"1447 Acres \xa0 Council, Adams County, ID"', '"1,190,000"']

我还要添加replace()以删除"

from bs4 import BeautifulSoup as BS

text = '''<a>     
   "1447 Acres &nbsp; Council, Adams County, ID"
    <br>
    "1,190,000" 
</a>'''

soup = BS(text, 'html.parser')

text = soup.get_text(separator='|', strip=True)
text = text.replace('"', '')

data = text.split('|')
print(data)

结果

['1447 Acres \xa0 Council, Adams County, ID', '1,190,000']

它还需要一些函数（可能在urllib）来将像 这样的实体转换为正确的字符，或者您可以使用replace('\xa0', '')删除它

网友

2楼 · 编辑于 2024-07-01 08:00:24

from bs4 import BeautifulSoup 

html_text = '<a>   "1447 Acres &nbsp; Council, Adams County, ID" <br> 
              "1,190,000" </a>'
soup = BeautifulSoup(html_text, "html.parser")
print(soup.text)

网友

3楼 · 编辑于 2024-07-01 08:00:24

根据您的评论，我理解您希望将每个字符串保存到不同的变量。您可以尝试以下方法：

import re
from bs4 import BeautifulSoup

html_doc = """<a>   
   "1447 Acres &nbsp; Council, Adams County, ID"
    <br>
    "1,190,000" 
</a>"""

soup = BeautifulSoup(html_doc, "html.parser")

a_tag = soup.find("a").get_text(strip=True)

a_tag = a_tag.replace(u"\xa0", "").replace('"', " ").strip()

# Split either on a double space or on a comma - which is not a digit
acres, council, location, id_, price = re.split(r"\s{2}|,[^0-9]", a_tag)

print(acres)
print(council)
print(location)
print(id_)
print(price)

输出：

1447 Acres
Council
Adams County
ID
1,190,000

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在换行前后获取字符串<br>

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >