<p>让我把重点放在html中问题的具体部分:</p>
<pre><code><a class='warp_lightbox' title='Comprar' href='//www.fotoregistro.com.br/
navhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO'><img src='
//sh.digipix.com.br/subhomes/_lojas_consumer/paginas/fotolivro/img/180slim/vitrine/classic_01_tb.jpg' alt='slim' />
</a>
</code></pre>
<p>您可以通过以下操作获得:</p>
<pre><code>for link in soup.find_all('a', {'class':'warp_lightbox'}):
url = link.get("href")
break
</code></pre>
<p>你发现<code>url</code>是:</p>
<pre><code>'//www.fotoregistro.com.br/\rnavhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO'
</code></pre>
<p>在字符串的开头可以看到两个重要的模式:</p>
<ul>
<li><code>//</code>这是保持当前协议的一种方法,参见<a href="https://stackoverflow.com/questions/4071117/uri-starting-with-two-slashes-how-do-they-behave">this</a></李>
<li><code>\r</code>这是ASCII回车(CR)</李>
</ul>
<p>当您打印它时,您只需丢失以下部分:</p>
<pre><code>//www.fotoregistro.com.br/\r
</code></pre>
<p>如果需要原始字符串,可以在<code>for</code>循环中使用<a href="https://docs.python.org/3/library/functions.html#repr" rel="nofollow noreferrer">^{<cd4>}</a>:</p>
<pre><code>print(repr(url))
</code></pre>
<p>你会得到:</p>
<pre><code>//www.fotoregistro.com.br/\rnavhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO
</code></pre>
<p>如果需要路径,可以替换初始零件:</p>
<pre><code>base = 'www.fotoregistro.com.br/'
for link in soup.find_all('a', {'class':'warp_lightbox'}):
url = link.get("href").replace('//www.fotoregistro.com.br/\r',base)
print(url)
</code></pre>
<p>你会得到:</p>
<pre><code>www.fotoregistro.com.br/navhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO
www.fotoregistro.com.br/navhome.php?lightbox&dpxshig=/iprop_prod=180-slim/tipo=fotolivro/width=950/height=615/control=true/tema=tema_02/preview=true/nome_tema=Q2wmYWFjdXRlO3NzaWNvIFByZXRv&cpmdsc=MOZAO
.
.
.
</code></pre>
<hr/>
<p>不指定类:</p>
<pre><code>for link in soup.find_all('a'):
url = link.get("href")
print(repr(url))
</code></pre>