我的任务是从一个字符串中获取“li”、“ul”标记,并对它们进行计数。 这是我尝试过的,它很有效,但正在寻找更好的解决方案
弦
<ul><li>Regularly wash your hands for 20 seconds or use a hand sanitizer with at least 60 percent alcohol. Pay attention to hand hygiene, especially when you’ve been in a public place and after coughing, sneezing, or blowing your nose.</li>
<li>Practice <a href="https://www.answers.com/Q/What_is_social_distancing" rel="nofollow ugc">social distancing</a> by increasing the space between you and other people. That means staying home as much as you can, especially if you feel sick.</li>
<li>Disinfect frequently touched surfaces (like keyboards, doorknobs, and light switches) every day.</li>
<li>Cover coughs and sneezes with the inside of your elbow or a tissue. Throw the tissue away immediately and wash your hands.</li>,</ul>
代码:
liTag = re.findall('<li>',String)
ulTag = re.findall('<ul>',String)
count = len(liTag) + len(ulTag)
在您的示例中
re
是一个很好的解决方案,您不必搜索其他方法最终你可以把它写成
但是如果您得到更复杂的标记,如
<ul class="...">
(或更复杂),那么regex
将不起作用,更好(更容易)的方法是使用lxml
、BeautifulSoup
或其他HTML解析器lxml:
你甚至可以试试
BeautifulSoup:
你甚至可以
编辑:在{}每个标记{},{}中,我添加了额外的信息-{},{},{},{},{},代码仍然可以正常工作,没有任何更改
对于
regex
,它需要'<(li|ul).*>'
或'<(ul|li)'
。但对于更复杂的事情,它需要更复杂的变化相关问题 更多 >
编程相关推荐