<p>我正在做网页抓取,到目前为止已经做过了-</p>
<pre><code>page = requests.get('http://abcdefgh.in')
print(page.status_code)
soup = BeautifulSoup(page.content, 'html.parser')
all_p = soup.find_all(class_="p-list-sec")
print((all_p))
</code></pre>
<p>这样做之后,当我打印所有</p>
<pre><code><div class = "p-list-sec">
<UI> <li> < a href = "link1", title = "tltle1">title1<a/></li>
<li> < a href = "link2", title = "tltle2">title2<a/></li>
<li> < a href = "link3", title = "tltle3">title3<a/></li>
</ui>
</div>
<div class = "p-list-sec">
<UI> <li> < a href = "link1", title = "tltle1">title1<a/></li>
<li> < a href = "link2", title = "tltle2">title2<a/></li>
<li> < a href = "link3", title = "tltle3">title3<a/></li>
</ui>
</div>
<div class = "p-list-sec">
<UI> <li> < a href = "link1", title = "tltle1">title1<a/></li>
<li> < a href = "link2", title = "tltle2">title2<a/></li>
<li> < a href = "link3", title = "tltle3">title3<a/></li>
</ui>
</div> and so on up to around 40 div classes.
</code></pre>
<p>现在我想提取类p-list-sec中的所有a href和title,并将它们存储到文件中。我知道如何将它们存储到文件中,但从all p-list-sec类中提取所有a href和title对我来说是个问题。
我正在使用Python3.9,并使用命令提示符在Windows10中使用请求和美化组库</p>
<p>谢谢,
阿克希</p>
<p>这样行吗</p>
<pre class="lang-py prettyprint-override"><code>...
for p in all_p:
for link in p.find_all('a'):
print(link['href'])
print(link.text) # or link['title']
</code></pre>