因此,我在Anaconda的Python3.8上使用bs4和请求包。我正试图从voxforge.com上获取所有的.tgz文件名。然而,在我使用request并将其转换为soup之后,之后的所有信息都消失了Link to page
import requests
import bs4
r = requests.get('http://www.repository.voxforge1.org/downloads/fr/Trunk/Audio/Main/16kHz_16bit/')
r.text
这将返回我需要的所有内容(并持续一段时间):
'<title>VoxForge Repository</title>\n\n\t<style type="text/css">\n\t.siteFunctions {\n\t\ttext-align: right;\n\t}\n\t.copyright {\n\t\ttest-align: left;\n\t\tcolor: #2E3436;\n\t\tfont-family: sans-serif;\n font-size: small;\n\t}\n\n\tbody {\n\t\tfont-family: "DejaVu Sans", "Lucida Sans Unicode", sans-serif;\n\t\tfont-weight:\tnormal;\n\t\tword-spacing:\tnormal;\n\t\tletter-spacing:\tnormal; \n\t\ttext-transform:\tnone;\n\t\tfont-size: medium;\n text-align: justify;\n\t}\n\th2 {\n\t\tfont-size:\t1.5em;\n\t\tfont-weight:\t700;\n\t\tmargin-top:1em;\n\t\tmargin-bottom:0.8em;\n\t}\n\th3 {\n\t\tfont-size:\t1.1em;\n\t\tfont-weight:\t600;\n\t\tmargin-top:1em;\n\t\tmargin-bottom:0.4em;\n\t}\n\tp, ol, ul {\n\t\tfont-size:\t1em;\n\t\tmargin-top:0.4em;\n\t\tmargin-bottom:0.4em;\n\t}\t\n\t.heading {\n\t\tbackground-color: #555753;\n color: #D3D7CF;\n\t\tfont-size: 40px;\n\t\tvertical-align: bottom;\n\t}\n\t.logo {\n\t\twidth: 100px; \n\t\tfloat: left;\n\t\ttext-align: left;\n\t}\n\t.logo img {\n\t\tborder: 0px;\n\t}\n\timg {\n\t\tborder: 0px;\n\t}\n\t.clickableicons {\n\t}\n\t.endFloat {\n\t\tclear: both;\n\t\n\t}\n\t.padding {\n\t\tpadding: 10px;\n\t}\n\t.bodyContent {\n\t\tbackground-color: #ffffff;\n\t\tcolor: #2E3436;\n text-align: justify;\n\t}\n\t.menu {\n color: #D3D7CF;\n\t\tbackground-color: #555753;\n\t\ttext-align: left;\n\t}\n\n\t.menu2 {\n color: #D3D7CF;\n\t\tbackground-color: #555753;\n\t\ttext-align: center;\n\t\t\n\t}\n\ta {\n\t\tcolor: #f57900;\n\t\ttext-decoration:none;\n\t}\n\ta:visited {\n\t\tcolor: #ce5c00;\n\t}\n\ta:hover {\n text-decoration:underline;\n\t}\n\t.menu a {\n\t\tcolor: #D3D7CF;\n\t\tfont-weight: bold; \n\t}\n\t.menu a:hover {\n\t\tcolor: #eeeeec;\n\t\ttext-decoration:none;\n\t}\n\n\t</style>\n</head><body>\n\n\n\n<div class="heading">\n<div class="padding">\n<div class="logo"><a href="http://www.voxforge.org"><img src="http://www.voxforge.org/uploads/8k/N8/8kN884Cd96cmBZxRlzmbzQ/voxforge-logo.jpg" alt="VoxForge Repository"> </a></div> \n\n<div class="endFloat"></div>\n\n</div>\n</div>\n\n<div class="menu">\n\t<div class="padding">\t\t\n\t\t\n\t\t\n<span class="horizontalMenu">\n\n<a class="horizontalMenu" href="http://www.voxforge.org/home">Home</a>\n · \n\n<a class="horizontalMenu" href="http://www.voxforge.org/home/read">Read</a>\n · \n\n<a class="horizontalMenu" href="http://www.voxforge.org/home/listen">Listen</a>\n · \n\n<a class="horizontalMenu" href="http://www.voxforge.org/home/forums">Forums</a>\n · \n\n<a class="horizontalMenu" href="http://www.voxforge.org/home/dev">Dev</a>\n\n · \n\n<a class="horizontalMenu" href="http://www.voxforge.org/home/downloads">Downloads</a>\n · \n\n<a class="horizontalMenu" href="http://www.voxforge.org/home/about">About</a>\n \n\n \n\n</span></div>\n\n</div>\n\n\n\n</div>\n\n</body></html>\n<pre><img src="/spicons/blank.gif" alt="Icon "> <a href="?C=N;O=D">Name</a> <a href="?C=M;O=A">Last modified</a> <a href="?C=S;O=A">Size</a> <hr><img src="/spicons/back.gif" alt="[PARENTDIR]"> <a href="/downloads/fr/Trunk/Audio/Main/">Parent Directory</a> - \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="4h-20100505-vgm.tgz">4h-20100505-vgm.tgz</a> 2010-05-13 11:34 1.6M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-bfg.tgz">Agoniste-20130928-bfg.tgz</a> 2014-02-17 05:02 1.8M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-fnn.tgz">Agoniste-20130928-fnn.tgz</a> 2014-02-18 04:32 1.9M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-gaf.tgz">Agoniste-20130928-gaf.tgz</a> 2014-02-18 04:32 2.0M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-izd.tgz">Agoniste-20130928-izd.tgz</a> 2014-02-18 04:32 1.8M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-ndz.tgz">Agoniste-20130928-ndz.tgz</a> 2014-02-18 04:32 1.8M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-pzq.tgz">Agoniste-20130928-pzq.tgz</a> 2014-02-18 04:32 2.0M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-qyu.tgz">Agoniste-20130928-qyu.tgz</a> 2014-02-18 04:32 2.1M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-rva.tgz">Agoniste-20130928-rva.tgz</a> 2014-02-18 04:32 1.8M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Agoniste-20130928-vio.tgz">Agoniste-20130928-vio.tgz</a> 2014-06-10 04:44 1.7M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-cyf.tgz">Alliage-20151109-cyf.tgz</a> 2015-11-13 04:08 1.1M \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-dqh.tgz">Alliage-20151109-dqh.tgz</a> 2015-11-13 04:08 960K \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-ewg.tgz">Alliage-20151109-ewg.tgz</a> 2015-11-13 04:08 963K \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-imx.tgz">Alliage-20151109-imx.tgz</a> 2015-11-13 04:08 855K \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-kny.tgz">Alliage-20151109-kny.tgz</a> 2015-11-13 04:08 924K \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-lcn.tgz">Alliage-20151109-lcn.tgz</a> 2015-11-13 04:08 910K \n<img src="/spicons/compressed.gif" alt="[ ]"> <a href="Alliage-20151109-rxi.tgz">Alliage-20151109-rxi.tgz</a>
但是当我使用bs4将其转换为html或lxml时:
soup = bs4.BeautifulSoup(r.text, 'html')
soup
我只取回了第一部分,其他所有信息在</车身>;:
<html><head><title>VoxForge Repository</title>
<style type="text/css">
.siteFunctions {
text-align: right;
}
.copyright {
test-align: left;
color: #2E3436;
font-family: sans-serif;
font-size: small;
}
body {
font-family: "DejaVu Sans", "Lucida Sans Unicode", sans-serif;
font-weight: normal;
word-spacing: normal;
letter-spacing: normal;
text-transform: none;
font-size: medium;
text-align: justify;
}
h2 {
font-size: 1.5em;
font-weight: 700;
margin-top:1em;
margin-bottom:0.8em;
}
h3 {
font-size: 1.1em;
font-weight: 600;
margin-top:1em;
margin-bottom:0.4em;
}
p, ol, ul {
font-size: 1em;
margin-top:0.4em;
margin-bottom:0.4em;
}
.heading {
background-color: #555753;
color: #D3D7CF;
font-size: 40px;
vertical-align: bottom;
}
.logo {
width: 100px;
float: left;
text-align: left;
}
.logo img {
border: 0px;
}
img {
border: 0px;
}
.clickableicons {
}
.endFloat {
clear: both;
}
.padding {
padding: 10px;
}
.bodyContent {
background-color: #ffffff;
color: #2E3436;
text-align: justify;
}
.menu {
color: #D3D7CF;
background-color: #555753;
text-align: left;
}
.menu2 {
color: #D3D7CF;
background-color: #555753;
text-align: center;
}
a {
color: #f57900;
text-decoration:none;
}
a:visited {
color: #ce5c00;
}
a:hover {
text-decoration:underline;
}
.menu a {
color: #D3D7CF;
font-weight: bold;
}
.menu a:hover {
color: #eeeeec;
text-decoration:none;
}
</style>
</head><body>
<div class="heading">
<div class="padding">
<div class="logo"><a href="http://www.voxforge.org"><img alt="VoxForge Repository" src="http://www.voxforge.org/uploads/8k/N8/8kN884Cd96cmBZxRlzmbzQ/voxforge-logo.jpg"/> </a></div>
<div class="endFloat"></div>
</div>
</div>
<div class="menu">
<div class="padding">
<span class="horizontalMenu">
<a class="horizontalMenu" href="http://www.voxforge.org/home">Home</a>
·
<a class="horizontalMenu" href="http://www.voxforge.org/home/read">Read</a>
·
<a class="horizontalMenu" href="http://www.voxforge.org/home/listen">Listen</a>
·
<a class="horizontalMenu" href="http://www.voxforge.org/home/forums">Forums</a>
·
<a class="horizontalMenu" href="http://www.voxforge.org/home/dev">Dev</a>
·
<a class="horizontalMenu" href="http://www.voxforge.org/home/downloads">Downloads</a>
·
<a class="horizontalMenu" href="http://www.voxforge.org/home/about">About</a>
</span></div>
</div>
</body></html>
我正在尝试在</车身>;,所以我需要找到一种方法来提取它们,bs4似乎正在删除它们。有人能帮忙吗
另一个解决方案:
印刷品:
尝试:
印刷品:
等等
相关问题 更多 >
编程相关推荐