如何使用python 3从url中只读html？

1条回答

网友

1楼 · 发布于 2024-09-25 08:35:59

只要阅读HTML，您就可以使用BeautfulSoup

#python -m pip install beautifulsoup4 lxml

from bs4 import BeautifulSoup

html = '''
 <link href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/css/bootstrap.min.css" rel="stylesheet" type="text/css">

    <div class="table-responsive grid_class">
    <table class="table lightgallery">
        <thead>
        <tr class="active">
            <th class="col-md-9">Col A</th>
            <th class="col-md-2">Col B</th>
        </tr>
        </thead>

        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>
       
        <tr>
            <td class="">               
            <span>some text here
            </span>
        </span>
        </span>
    </td>
        <td class="text-nowrap" style="font-size: 13px;"><span>some text here also</span></td>
        </tr>   
        
    </table>
</div>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.5.0/js/bootstrap.bundle.min.js"></script>
'''

soup = BeautifulSoup(html, 'lxml')

您可以使用.find[_all]或.select访问变量和标记例如

ths = soup.find_all('th')
print([col.text for col in ths])
# ['Col A', 'Col B']

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用python 3从url中只读html？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >