使用BeautifulSoup在流文本中刮取B

def get_doc_yakarouler(license_plate,url = 'https://www.yakarouler.com/car_search/immat?immat='): response = requests.get(url+license_plate) content = response.content doc = BeautifulSoup(content,'html.parser') result = doc.span.text if 'identifié' in result : return doc else : return f"La plaque {license_plate} n'est pas recensé sur yakarouler" doc = get_doc_yakarouler('AA300AA') span = doc.find_all('span') motorisation_tag = span[1]

2条回答

网友

1楼 · 编辑于 2024-09-29 17:14:58

from bs4 import BeautifulSoup as bs , NavigableString
html = '<span><span>some content</span> B</span>'
soup = bs(html, 'html.parser')
span = soup.find("span")
# First approach Using Regular Expressions
outer_text_1 = span.find(text=True, recursive=False)
# Second approach is looping through the contents of the tag and check if it's the outer text and not a tag
outer_text_2 = ' '.join([t for t in span.contents if type(t)== NavigableString])

print(outer_text_1) # output B
print(outer_text_2) # output B

网友

2楼 · 编辑于 2024-09-29 17:14:58

假设您有一个变量span，它表示外部<span>标记，您可以执行以下操作来提取“B”：span.contents[1]。这是因为.contents将返回标记内容的列表，在本例中为[<span>some content</span>, ' B']。然后可以访问“B”文本作为数组的第二个元素。请注意，如果B前面有空格，如HTML示例中所示，则该空格将包含在字符串中

相关问题更多 >

编程相关推荐

热门问题

热门文章