如何编写一个正则表达式来提取无序列表和前面的段落

2条回答

网友

1楼 · 编辑于 2024-10-06 11:18:50

为什么您需要regex而beautifulsoup能够完全处理任何类型的html—最好尝试css selectors这里div.Mother div.Son ul li的意思是选择所有divs的类名Mother，然后在里面选择所有divs的类名Son，然后选择里面的ul，最后选择里面的所有li。你知道吗

from bs4 import BeautifulSoup as bs

data = """

    <body>
    <div class="Mother" >
        <div class="Son" >
            <p><strong><strong> </strong></strong>It can be hard to admit that rebranding is necessary. Companies can often be attached to their brand, even if it is hurting their sales. Consider rebranding if:</p>
            <ul>
                <li>You are experiencing a decrease in sales and customers</li>
                <li>If your brand design does not reflect what you deliver</li>
                <li>If you want to attract a new target audience</li>
                <li>Management change</li>
                <li><a href="http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/', '19 Questions to Ask Yourself Before You Start Rebranding');">19 Questions to Ask Yourself Before You Start Rebranding</a></li>
            </ul>
        </div>
    </div>
</body>

"""

soup = bs(data,'lxml')
#To grab all inside the ul
for item in soup.select('div.Mother div.Son'):
    print item.text.strip()
print  "="*100
#Just to grab all li    
for li in soup.select('div.Mother div.Son ul li'):
    print li.text.strip()

输出-

It can be hard to admit that rebranding is necessary. Companies can often be attached to their brand, even if it is hurting their sales. Consider rebranding if:

You are experiencing a decrease in sales and customers
If your brand design does not reflect what you deliver
If you want to attract a new target audience
Management change
19 Questions to Ask Yourself Before You Start Rebranding
====================================================================================================
You are experiencing a decrease in sales and customers
If your brand design does not reflect what you deliver
If you want to attract a new target audience
Management change
19 Questions to Ask Yourself Before You Start Rebranding

网友

2楼 · 编辑于 2024-10-06 11:18:50

不要使用正则表达式来解析HTML！

BeautifulSoup可以轻松、优雅、正确地完成您想要的一切：

>>> soup = bs4.BeautifulSoup(r"""
    <p><strong><strong> </strong></strong>It can be hard to admit that rebranding is necessary. Companies can often be attached to their brand, even if it is hurting their sales. Consider rebranding if:</p>
    <ul>
    <li>You are experiencing a decrease in sales and customers</li>
    <li>If your brand design does not reflect what you deliver</li>
    <li>If you want to attract a new target audience</li>
    <li>Management change</li>
    <li><a href="http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/', '19 Questions to Ask Yourself Before You Start Rebranding');">19 Questions to Ask Yourself Before You Start Rebranding</a></li>
    </ul>
""")
>>> bulleted_lists = soup.findAll('ul')
>>> uls_with_ps = [(ul.findPrevious('p'), ul) for ul in bulleted_lists]

要了解情况：

>>> bulleted_lists
[<ul>
<li>You are experiencing a decrease in sales and customers</li>
<li>If your brand design does not reflect what you deliver</li>
<li>If you want to attract a new target audience</li>
<li>Management change</li>
<li><a href="http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/" onclick="__gaTracker('send', 'event', 'outbound-article', 'http://www.risingabovethenoise.com/how-to-rebrand-19-questions-ask-before-you-start/', '19 Questions to Ask Yourself Before You Start Rebranding');">19 Questions to Ask Yourself Before You Start Rebranding</a></li>
</ul>]

>>> bulleted_lists[0].findPrevious('p')
<p><strong><strong> </strong></strong>It can be hard to admit that rebranding is necessary. Companies can often be attached to their brand, even if it is hurting their sales. Consider rebranding if:</p>

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何编写一个正则表达式来提取无序列表和前面的段落

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >