如何使用python 3从url中只读html?

2024-09-25 08:35:59 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是给定的html

    <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">

    <div class="table-responsive grid_class">
    <table class="table lightgallery">
        <thead>
        <tr class="active">
            <th class="col-md-9">Col A</th>
            <th class="col-md-2">Col B</th>
        </tr>
        </thead>

        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>
       
        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>   
        
    </table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>


如何在python中只获取html而不获取库

我尝试了urllib库和request库,但不起作用

如有任何帮助,我们将不胜感激


Tags: texthttpsherejstablescriptsomebootstrap
1条回答
网友
1楼 · 发布于 2024-09-25 08:35:59

只要阅读HTML,您就可以使用BeautfulSoup

#python -m pip install beautifulsoup4 lxml

from bs4 import BeautifulSoup

html = '''
 <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">

    <div class="table-responsive grid_class">
    <table class="table lightgallery">
        <thead>
        <tr class="active">
            <th class="col-md-9">Col A</th>
            <th class="col-md-2">Col B</th>
        </tr>
        </thead>

        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>
       
        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>   
        
    </table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>
'''

soup = BeautifulSoup(html, 'lxml')

您可以使用.find[_all].select访问变量和标记 例如

ths = soup.find_all('th')
print([col.text for col in ths])
# ['Col A', 'Col B']

相关问题 更多 >